CN114359041A - Light field image space super-resolution reconstruction method - Google Patents
Light field image space super-resolution reconstruction method Download PDFInfo
- Publication number
- CN114359041A CN114359041A CN202111405987.1A CN202111405987A CN114359041A CN 114359041 A CN114359041 A CN 114359041A CN 202111405987 A CN202111405987 A CN 202111405987A CN 114359041 A CN114359041 A CN 114359041A
- Authority
- CN
- China
- Prior art keywords
- feature maps
- spatial
- residual block
- output
- resolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 123
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 454
- 238000010586 diagram Methods 0.000 claims description 138
- 238000005070 sampling Methods 0.000 claims description 116
- 238000012549 training Methods 0.000 claims description 85
- 238000005215 recombination Methods 0.000 claims description 41
- 230000006798 recombination Effects 0.000 claims description 41
- 230000006870 function Effects 0.000 claims description 37
- 230000004913 activation Effects 0.000 claims description 34
- 238000006243 chemical reaction Methods 0.000 claims description 24
- 238000011176 pooling Methods 0.000 claims description 22
- 238000012360 testing method Methods 0.000 claims description 18
- 230000009466 transformation Effects 0.000 claims description 13
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 6
- 230000006872 improvement Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 239000000203 mixture Substances 0.000 claims description 3
- 230000008521 reorganization Effects 0.000 claims description 3
- 238000004804 winding Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 2
- 239000008186 active pharmaceutical agent Substances 0.000 claims 1
- 230000005284 excitation Effects 0.000 claims 1
- 238000003475 lamination Methods 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 2
- 230000002708 enhancing effect Effects 0.000 abstract description 2
- 238000012545 processing Methods 0.000 description 35
- 238000003384 imaging method Methods 0.000 description 9
- 238000013441 quality evaluation Methods 0.000 description 9
- 230000003287 optical effect Effects 0.000 description 6
- 230000000007 visual effect Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 241000711981 Sais Species 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000016776 visual perception Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 101001013832 Homo sapiens Mitochondrial peptide methionine sulfoxide reductase Proteins 0.000 description 1
- 102100031767 Mitochondrial peptide methionine sulfoxide reductase Human genes 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000013100 final test Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000007794 visualization technique Methods 0.000 description 1
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a light field image space super-resolution reconstruction method, which constructs a space super-resolution network and comprises an encoder, an aperture level feature registration module, a light field feature enhancement module, a decoder and the like, wherein the encoder is used for extracting multi-scale features from an up-sampled low-space-resolution light field image, a 2D high-resolution image and blurred images thereof; learning the correspondence between the 2D high resolution features and the low resolution light field features through an aperture level feature registration module to register the 2D high resolution features under each sub-aperture image and form registered high resolution light field features; enhancing the extracted shallow light field characteristics by utilizing the registered high-resolution light field characteristics through a light field characteristic enhancement module to obtain enhanced high-resolution light field characteristics; reconstructing the enhanced high-resolution light field characteristics into a high-spatial-resolution light field image by using a decoder; the method has the advantages that the high-spatial-resolution light field image can be reconstructed with high quality, and texture and detail information can be recovered.
Description
Technical Field
The invention relates to an image super-resolution reconstruction technology, in particular to a light field image space super-resolution reconstruction method.
Background
Unlike conventional digital cameras, light field cameras can capture the intensity (i.e., spatial information) and directional (i.e., angular information) information of light rays in a scene, thereby more realistically recording the real world. At the same time, the rich information implied by 4-Dimensional (4D) light field images acquired by light field cameras facilitates many applications such as refocusing, depth estimation, virtual/augmented reality, etc. Current commercial-grade light field cameras employ microlens arrays to separate light rays in different directions that pass through the same location point in the scene, and then simultaneously record spatial information and angular information at the sensor plane. However, because the resolution of the sensor shared by the spatial and angular dimensions is limited, the spatial resolution of the acquired 4D light field image is inevitably reduced while providing high angular sampling (or called high angular resolution), and thus, improving the spatial resolution of the 4D light field image becomes an important problem to be solved in the field of light field research.
In general, a 4D light field Image includes a plurality of interconvertible visualization methods, such as a Sub-Aperture Image (SAI) array displayed based on 2-Dimensional (2-Dimensional, 2D) spatial information, a Micro-Lens Image (MLI) array displayed based on 2D angular information, and an Epipolar Plane Image (EPI) displayed combining 1-Dimensional spatial information and 1-Dimensional angular information. Intuitively, increasing the spatial resolution of the 4D light-field image is to increase the resolution of each 2D SAI in the 4D light-field image. Therefore, it is a straightforward matter to apply the existing super-resolution reconstruction method for 2D images, such as the depth back-projection network proposed by Haris et al, the depth laplacian pyramid network proposed by Lai et al, and the like, to each SAI independently, but this approach ignores the information embedded in the angle domain by the 4D light field image, and it is difficult to ensure the angular consistency of the super-resolution result. Therefore, the key to designing the 4D light field image spatial super-resolution reconstruction method is to explore the high-dimensional structural characteristics of the 4D light field image. The current spatial super-resolution reconstruction methods for 4D light field images can be broadly classified into two categories, optimization-based and learning-based.
Optimization-based methods typically utilize estimated disparity or depth information to model the relationship between SAIs of 4D light-field images, thereby representing 4D light-field image spatial super-resolution reconstruction as an optimization problem. However, the disparity or depth information inferred from low spatial resolution light-field images is not very reliable and hence optimization-based methods exhibit rather limited performance.
The learning-based approach is to explore the intrinsic high-dimensional structure of the 4D light-field image in a data-driven manner and thus learn the non-linear mapping between the low-spatial-resolution light-field image and the high-spatial-resolution light-field image. For example, Yeung et al iteratively utilize spatial and angular information of the 4D light-field image using a space-angle separable convolution. Wang et al developed a space-angle interaction network to fuse the spatial and angular information of 4D light-field images. Jin et al propose a novel fusion mechanism to exploit the compensation information between SAIs and recover the parallax detail of 4D light-field images through a two-stage network. Although the above method achieves better performance at a low reconstruction scale (e.g., 2 x), it still fails to efficiently recover sufficient texture and detail information at a large reconstruction scale (e.g., 8 x). This is because low resolution light-field images contain limited spatial and angular information, which in turn can only infer details lost by low resolution from information within the 4D light-field image. Boominathan et al propose a spatial super-resolution reconstruction method using a hybrid input light field image, which improves the spatial resolution of a 4D light field image by introducing an additional high-resolution 2D image as supplementary information, but the average fusion mechanism in the method easily blurs the reconstruction result, and processing each SAI independently destroys the parallax structure of the reconstructed light field image.
In summary, although the related researches at present have achieved good spatial super-resolution reconstruction effect of light field images at low reconstruction scale, there still remains a certain disadvantage in dealing with the problem of large reconstruction scale (such as 8 ×), in particular, there is still a certain improvement space in recovering high frequency texture information of the reconstructed light field images, avoiding visual artifacts, and preserving parallax structures.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a light field image spatial super-resolution reconstruction method, which combines a light field camera and a traditional 2D camera to form a heterogeneous imaging system, wherein the light field camera provides rich angle information and limited spatial information, while the traditional 2D camera only acquires the intensity information of light to acquire enough spatial information, so that the angle information and the spatial information acquired by the two can be fully utilized to reconstruct a high-spatial-resolution light field image with high quality, recover the texture and the detail information of the reconstructed light field image, avoid ghost artifacts caused by parallax and keep a parallax structure.
The technical scheme adopted by the invention for solving the technical problems is as follows: a light field image space super-resolution reconstruction method is characterized by comprising the following steps:
step 1: selecting Num color three-channel low-spatial-resolution light field images with spatial resolution of W multiplied by H and angular resolution of V multiplied by U, corresponding Num color three-channel 2D high-resolution images with resolution of alpha W multiplied by alpha H, and corresponding Num color three-channel reference high-spatial-resolution light field images with spatial resolution of alpha W multiplied by alpha H and angular resolution of V multiplied by U; wherein Num is more than 1, alpha represents the spatial resolution improvement multiple, and the value of alpha is more than 1;
step 2: constructing a convolutional neural network as a spatial super-resolution network: the spatial super-resolution network comprises an encoder for extracting multi-scale features, an aperture level feature registration module for registering light field features and 2D high-resolution features, a shallow layer feature extraction layer for extracting shallow layer features from a low spatial resolution light field image, a light field feature enhancement module for fusing the light field features and the 2D high-resolution features, a spatial attention block for relieving registration errors in the coarse-scale features, and a decoder for reconstructing potential features into the light field image;
for the encoder, the encoder is composed of a first convolution layer, a second convolution layer, a first residual block and a second residual block which are connected in sequence, wherein the input end of the first convolution layer receives three inputs in parallel, and the three inputs are respectively one with spatial resolution of W multiplied by H and angular resolution of V multiplied by USingle channel image L of a low spatial resolution light field imageLRThe width of the image reconstruction obtained after the spatial resolution up-sampling is alphasW x V and height of alphasH × U subaperture image array, which is denoted asA width of alphasW and a height of alphasThe single-channel image of the blurred 2D high-resolution image of H is described asAnd a width of alphasW and a height of alphasSingle channel image of H2D high resolution image, denoted as IHRThe output end of the first convolution layer is directed toOutput 64 frames with width alphasW x V and height of alphasH × U signature graph, will be directed toThe set of all the output feature maps is denoted asOutput terminal of the first winding layer is aimed atOutput 64 frames with width alphasW and a height of alphasH characteristic diagram, will be directed toThe set of all the output feature maps is denoted asOutput terminal of the first convolution layer is directed to IHROutput 64 frames with width alphasW and a height of alphasH signature of H will be directed to IHRThe set of all the output feature maps is denoted asYHR,0(ii) a The input terminal of the second convolutional layer receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All feature maps and Y in (1)HR,0All feature maps in (1), the output of the second convolutional layer being directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal of the second convolution layer is aimed atOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asFeeding of the second convolution layerOutput end is directed to YHR,0Output 64 frames with width ofAnd has a height ofWill be directed to YHRAnd the set of all the characteristic diagrams output by 0 is marked as YHR,1(ii) a The input terminal of the first residual block receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All feature maps and Y in (1)HR,1The output of the first residual block is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput of the first residual block is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput of the first residual block for YHR,1Output 64 frames with width ofAnd has a height ofWill be directed to YHR,1The set of all the output feature maps is denoted as YHR,2(ii) a The input terminal of the second residual block receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All feature maps and Y in (1)HR,2Of the second residual block, the output of the second residual block being directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput pair of second residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput of the second residual block for YHR,2Output 64 frames with width ofAnd has a height ofWill be directed to YHR,2The set of all the output feature maps is denoted as YHR,3(ii) a Wherein,is a single-channel image L of a low spatial resolution light-field image with spatial resolution W × H and angular resolution V × ULRThe width of the image recombination obtained after the bicubic interpolation up-sampling is alphasW x V and height of alphasAn array of H U sub-aperture images,to pass through the pair IHRFirstly carrying out bicubic interpolation downsampling and then carrying out bicubic interpolation upsampling to obtain alphasRepresenting a spatial resolution sampling factor, alphas 3Upsampling factor and bicubic interpolation for bicubic interpolation upsamplingThe down-sampling factors of the down-sampling of the values are all taken as alphasThe size of the convolution kernel of the first convolution layer is 3 × 3, the convolution step is 1, the number of input channels is 1, the number of output channels is 64, the size of the convolution kernel of the second convolution layer is 3 × 3, the convolution step is 2, the number of input channels is 64, the number of output channels is 64, and the activation functions adopted by the first convolution layer and the second convolution layer are both 'ReLU';
for the aperture level feature registration module, the input end of the aperture level feature registration module receives three types of feature maps, wherein the first type isAll characteristic diagrams in (1), the second class isThe third class includes four inputs, respectively YHR,0All feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps in (1), YHR,3All feature maps in (1); in the aperture level feature registration module, first, the image data is processedAll feature maps in (1), YHR,0All feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps and Y in (1)HR,3All feature maps in (1) are each replicated by a factor of V × U, so thatAll feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps and Y in (1)HR,3Becomes the width of all the feature maps inAnd the height becomesI.e. to obtain the dimensions andand matching the size of the feature map in (1) with YHR,0Becomes asW x V and height becomes alphasH × U, i.e. to size andthe dimensions of the feature maps in (1) match; then toAll characteristic figures in (1) andall the characteristic diagrams in the method are subjected to block matching, and a width of the characteristic diagram is obtained after the block matching is finishedAnd has a height ofIs marked as PCI(ii) a Then according to PCIIs a reaction of YHR,1All the characteristic diagrams in (1) andall feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width ofAnd has a height ofThe obtained set of all the registration feature maps is denoted as FAlign,1(ii) a Also according to PCIIs a reaction of YHR,2All the characteristic diagrams in (1) andall feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width ofAnd has a height ofThe obtained set of all the registration feature maps is denoted as FAlign,2(ii) a According to PCIIs a reaction of YHR,3All the characteristic diagrams in (1) andall feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width ofAnd has a height ofThe obtained set of all the registration feature maps is denoted as FAlign,3(ii) a For P againCIPerforming bicubic interpolation up-sampling to obtain a frame with width alphasW x V and height of alphasH × U coordinate index diagram, notedFinally according toWill YHR,0All the characteristic diagrams in (1) andall feature maps in the image are registered in space position to obtain 64 pieces of width alphasW x V and height of alphasH × U registration feature map, and F represents a set of all the obtained registration feature mapsAlign,0(ii) a Output F of aperture level feature registration moduleAlign,0All characteristic diagrams in (1), FAlign,1All characteristic diagrams in (1), FAlign,2All feature maps and F in (1)Align,3All feature maps in (1); wherein, the precision scale for block matchingThe quantity index is a texture and structure similarity index, the size of a block for block matching is 3 x 3, and the upsampling factor of bicubic interpolation upsampling is alphas;
For the shallow feature extraction layer, it is composed of 1 fifth convolution layer, the input end of which receives a single-channel image L of a low spatial resolution light field image with spatial resolution WxH and angular resolution VxULRThe output end of the fifth convolution layer outputs 64 characteristic diagrams with the width of W multiplied by V and the height of H multiplied by U, and the set formed by all the output characteristic diagrams is denoted as FLR(ii) a The convolution kernel of the fifth convolution layer has a size of 3 × 3, a convolution step size of 1, a number of input channels of 1, a number of output channels of 64, and the activation function adopted by the fifth convolution layer is "ReLU";
for the light field characteristic enhancement module, the light field characteristic enhancement module consists of a first enhancement residual block, a second enhancement residual block and a third enhancement residual block which are connected in sequence, wherein the input end of the first enhancement residual block receives FAlign,1All feature maps and F in (1)LROf 64 width at the output of the first enhancement residual blockAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,1(ii) a The input of the second enhanced residual block receives FAlign,2All feature maps and F in (1)En,1Of 64 widths at the output of the second enhanced residual blockAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,2(ii) a Input terminal of third enhanced residual blockReceiving FAlign,3All feature maps and F in (1)En,2Of 64 width at the output of the third enhanced residual blockAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,3;
For a spatial attention block, which consists of a sixth convolutional layer and a seventh convolutional layer connected in sequence, the input of the sixth convolutional layer receives FAlign,0The output end of the sixth convolutional layer outputs 64 characteristic graphs with the width of alphasW x V and height of alphasH × U spatial attention feature map, and F represents a set of all output spatial attention feature mapsSA1(ii) a Input terminal of seventh convolution layer receiving FSA1The output end of the seventh convolutional layer outputs 64 width alpha of all spatial attention feature maps insW x V and height of alphasH × U spatial attention feature map, and F represents a set of all output spatial attention feature mapsSA2(ii) a F is to beAlign,0All feature maps in (1) and (F)SA2Multiplying all the spatial attention feature maps element by element, and recording the set formed by all the obtained feature maps as FWA,0(ii) a F is to beWA,0As all feature maps output by the output end of the spatial attention block; the sizes of convolution kernels of the sixth convolution layer and the seventh convolution layer are both 3 multiplied by 3, convolution step lengths are both 1, the number of input channels is 64, the number of output channels is 64, the activation function adopted by the sixth convolution layer is 'ReLU', and the activation function adopted by the seventh convolution layer is 'Sigmoid';
for the decoder, the decoder is composed of a third residual block, a fourth residual block, a sub-pixel convolution layer, an eighth convolution layer and a ninth convolution layer which are connected in sequence, wherein the input end of the third residual block receives FEn,3Of 64 widths at the output of the third residual blockAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FDec,1(ii) a The input of the fourth residual block receives FDec,1Of 64 width at the output of the fourth residual blockAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FDec,2(ii) a Input terminal of sub-pixel convolution layer receiving FDec,2The output end of the sub-pixel convolution layer outputs 256 widths of all the characteristic maps inAnd has a height ofAnd 256 widths are set asAnd has a height ofFurther converting the feature map into 64 pieces with the width alphasW x V and height of alphasH × U feature graph, and F represents a set of all converted feature graphsDec,3(ii) a Input terminal of eighth convolution layer receiving FDec,3All feature maps in (1) and (F)WA,0The result of element-by-element addition of all the feature maps in (1), the output end of the eighth convolutional layer outputs 64 width alphasW x V and height of alphasH × U feature map, and F represents a set of all output feature mapsDec,4(ii) a Input terminal of the ninth convolutional layer receives FDec,4The output end of the ninth convolutional layer outputs a characteristic diagram with a width of alphasW x V and height of alphasH multiplied by U, the single-channel light field image is reconstructed, and the width is alphasW x V and height of alphasReconstruction of H multiplied by U single-channel light field image into alpha-space resolutionsW×αsH and high spatial resolution single-channel light field image with angular resolution of V multiplied by U, which is recorded as LSR(ii) a The size of a convolution kernel of the sub-pixel convolution layer is 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 256, the size of a convolution kernel of the eighth convolution layer is 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 64, the size of a convolution kernel of the ninth convolution layer is 1 multiplied by 1, the convolution step is 1, the number of input channels is 64, the number of output channels is 1, the activation functions adopted by the sub-pixel convolution layer and the eighth convolution layer are both 'ReLU', and the ninth convolution layer does not adopt the activation function;
and step 3: performing color space conversion on each low spatial resolution light field image in the training set, the corresponding 2D high resolution image and the corresponding reference high spatial resolution light field image, namely converting the RGB color space into the YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of each low spatial resolution light field image into a sub-aperture image array with the width of W multiplied by V and the height of H multiplied by U for representation; then, a sub-aperture image array recombined with Y-channel images of all the light field images with low spatial resolution in the training set, a corresponding Y-channel image of the 2D high-resolution image and a corresponding Y-channel image of the reference light field image with high spatial resolution form the training set; and then constructing a pyramid network, and training by using a training set, wherein the concrete process is as follows:
step 3_ 1: copying the constructed spatial super-resolution network three times, cascading, sharing the weight of each spatial super-resolution network, namely, all the parameters are the same, and defining the whole network formed by the three spatial super-resolution networks as a pyramid network; at each pyramid level, the reconstruction scale of the spatial super-resolution network is set to be equal to αsThe values are the same;
step 3_ 2: carrying out two times of spatial resolution downsampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after downsampling as a label image; carrying out the same spatial resolution down-sampling twice on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image aiming at a first spatial super-resolution network in the pyramid network; then recombining the Y-channel images of all the low spatial resolution light field images in the training set to obtain a sub-aperture image array, and performing primary spatial resolution up-sampling on the Y-channel images of all the low spatial resolution light field images in the training set to obtain an image recombined sub-aperture image array, and inputting all 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid network into the first spatial super-resolution network in the constructed pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial-resolution light field image in the training set.sReconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same;
step 3_ 3: carrying out single spatial resolution down-sampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after the down-sampling as a label image; carrying out single same spatial resolution down-sampling on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image for a second spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training setsMultiple reconstruction high spatial resolution Y-channel light field imageRecombined subaperture image array, alpha corresponding to Y-channel image of all low spatial resolution light field images in training setsInputting fuzzy 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the reconstructed image subaperture image array, all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network into the second spatial super-resolution network in the constructed pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial resolution light field image in the training sets 2Reconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same;
step 3_ 4: taking a Y-channel image of each reference high-spatial-resolution light field image in the training set as a label image; taking the Y-channel image of each 2D high-resolution image in the training set as a 2D high-resolution Y-channel image for a third spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training sets 2Sub-aperture image array for reconstructing high-spatial-resolution Y-channel light field image recombination in multiple mode, and alpha corresponding to Y-channel images of all low-spatial-resolution light field images in training sets 2Inputting blurred 2D high-resolution Y-channel images obtained by performing one-time spatial resolution down-sampling and one-time spatial resolution up-sampling on a sub-aperture image array of image recombination obtained by performing one-time spatial resolution up-sampling on a multiple-reconstruction high-spatial-resolution Y-channel light field image, all 2D high-resolution Y-channel images aiming at a third spatial super-resolution network in a pyramid network and all 2D high-resolution Y-channel images aiming at the third spatial super-resolution network in the pyramid network into a sub-aperture image array of image recombination, all 2D high-resolution Y-channel images obtained by performing one-time spatial resolution down-sampling and one-time spatial resolution up-sampling on a multiple-reconstruction high-spatial-resolution Y-channel light field imageTraining in a third spatial super-resolution network in the constructed pyramid network to obtain alpha corresponding to the Y-channel image of each low spatial resolution light field image in the training sets 3Reconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same;
obtaining the optimal weight parameters of all convolution kernels in each spatial super-resolution network in the pyramid network after the training is finished, and obtaining a well-trained spatial super-resolution network model;
and 4, step 4: randomly selecting a low-spatial-resolution light field image with three color channels and a corresponding 2D high-resolution image with three color channels as test images; then, converting the low-spatial-resolution light field image of the three color channels and the corresponding 2D high-resolution image of the three color channels from an RGB color space to a YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of the light field image with low spatial resolution into a sub-aperture image array for representation; inputting blurred 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the Y-channel images of the low-spatial resolution light field images, the Y-channel images of the 2D high-resolution images and the Y-channel images of the 2D high-resolution images into a spatial super-resolution network model, and testing to obtain reconstructed high-spatial resolution Y-channel light field images corresponding to the Y-channel images of the low-spatial resolution light field images; then performing bicubic interpolation up-sampling on the Cb channel image and the Cr channel image of the low-spatial-resolution light field image respectively to obtain a reconstructed high-spatial-resolution Cb channel light field image corresponding to the Cb channel image of the low-spatial-resolution light field image and a reconstructed high-spatial-resolution Cr channel light field image corresponding to the Cr channel image of the low-spatial-resolution light field image; and finally, cascading the obtained reconstructed high-spatial-resolution Y-channel light field image, the reconstructed high-spatial-resolution Cb-channel light field image and the reconstructed high-spatial-resolution Cr-channel light field image on the dimension of a color channel, and converting the cascading result into an RGB color space again to obtain the reconstructed high-spatial-resolution light field image of the color three channels corresponding to the low-spatial-resolution light field image.
In step 2, the first, second, third and fourth residual blocks have the same structure and are composed of sequentially connected third and fourth convolution layers, and the input end of the third convolution layer in the first residual block receives three inputs in parallel, namely, three inputs respectivelyAll the characteristic diagrams in (A),All feature maps and Y in (1)HR,1Of the third convolutional layer in the first residual block, the output end of the third convolutional layer is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal pair of the third convolution layer in the first residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asThe output of the third convolutional layer in the first residual block is for YHR,1Output 64 frames with width ofAnd has a height ofWill be directed to YHR,1The set of all the output feature maps is denoted asThe input terminal of the fourth convolutional layer in the first residual block receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All characteristic figures in (1) andof the fourth convolutional layer in the first residual block, the output of the fourth convolutional layer is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal pair of fourth convolution layer in first residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal pair of fourth convolution layer in first residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asWill be provided withAll the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the first residual block for comparisonAll the output feature maps, the set formed by the feature maps is theWill be provided withAll the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the first residual block for comparisonAll the output feature maps, the set formed by the feature maps is theWill be provided withAll the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as the output end of the first residual block and aim at YHR,1All the output feature maps, and the set formed by the feature maps is YHR,2;
Second residual errorThe input of the third convolutional layer in the block receives three inputs in parallel, one for eachAll the characteristic diagrams in (A),All feature maps and Y in (1)HR,2Of the third convolutional layer in the second residual block, the output end of the third convolutional layer is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput pair of the third convolutional layer in the second residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asThe output of the third convolutional layer in the second residual block is for YHR,2Output 64 frames with width ofAnd has a height ofWill be directed to YHR,2The set of all the output feature maps is denoted asThe input terminal of the fourth convolutional layer in the second residual block receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All characteristic figures in (1) andof the fourth convolutional layer in the second residual block, the output of the fourth convolutional layer is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal pair of fourth convolution layer in second residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal pair of fourth convolution layer in second residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asWill be provided withAll the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block for comparisonAll the output feature maps, the set formed by the feature maps is theWill be provided withAll the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block for comparisonAll the output feature maps, the set formed by the feature maps is theWill YHR,2All the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block and aim at YHR,2All the output feature maps, and the set formed by the feature maps is YHR,3;
The input of the third convolutional layer in the third residual block receives FEn,3Of 64 width at the output of the third convolutional layer in the third residual blockAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedThird stepInput reception of the fourth convolutional layer in the residual blockOf 64 width at the output of the fourth convolutional layer in the third residual blockAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedF is to beEn,3All the characteristic diagrams in (1) andall the feature maps in the third residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third residual block, and the set formed by the feature maps is FDec,1;
The input of the third convolutional layer in the fourth residual block receives FDec,1The output end of the third convolution layer in the fourth residual block outputs 64 widthAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedInput reception of a fourth convolutional layer in a fourth residual blockAll feature maps in (1), the output of the fourth convolutional layer in the fourth residual blockOut of 64 widthsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedF is to beDec,1All the characteristic diagrams in (1) andall the feature maps in the first residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the fourth residual block, and the set formed by the feature maps is FDec,2;
In the above, the sizes of convolution kernels of the third convolution layer and the fourth convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is 64, the number of output channels is 64, and the activation function adopted by the third convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block is "ReLU" and the activation function adopted by the fourth convolution layer is not adopted.
In the step 2, the first enhancement residual block, the second enhancement residual block and the third enhancement residual block have the same structure, and are composed of a first spatial characteristic transformation layer, a first spatial angle convolution layer, a second spatial characteristic transformation layer, a second spatial angle convolution layer and a channel attention layer which are connected in sequence, the first spatial characteristic transformation layer and the second spatial characteristic transformation layer have the same structure and are composed of a tenth convolution layer and an eleventh convolution layer which are parallel, the first spatial angle convolution layer and the second spatial angle convolution layer have the same structure and are composed of a twelfth convolution layer and a thirteenth convolution layer which are connected in sequence, and the channel attention layer is composed of a global mean value pooling layer, a fourteenth convolution layer and a fifteenth convolution layer which are connected in sequence;
first increaseThe input of the tenth convolutional layer in the first spatial feature transform layer in the strong residual block receives FAlign,1The output end of the tenth convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedThe input of the first spatial feature transform layer in the first enhanced residual block receives FLRAll feature maps in (1), will FLRAll the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultThe obtained all feature maps are used as the output of the first spatial feature transform layer in the first enhanced residual blockAll feature maps outputted from the output end are described as a set of these feature maps
An input of a twelfth of the first spatial angle convolutional layers in the first enhanced residual block receivesOf the first spatial angle convolutional layer in the first enhancement residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer outputs 64 widthsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the first enhancement residual block receivingThe output end of the thirteenth convolutional layer of the first space angle convolutional layer in the first enhancement residual block outputs 64 widths as the result of the reorganization operation of all the feature maps in (1)And has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforming an operation of reconstructing all feature maps from an angle dimension to a space dimension, taking all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a first space angle convolution layer in a first enhanced residual block, and recording a set formed by the feature maps as a set
The input terminal of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedThe input of the second spatial feature transform layer in the first enhanced residual block receivesAll the characteristic diagrams in (1) willAll the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultThe obtained feature maps are used as all feature maps output by the output end of the second spatial feature transform layer in the first enhanced residual block, and the set formed by the feature maps is recorded as a set
An input of a twelfth of the second spatial angle convolutional layers in the first enhanced residual block receivesOf the twelfth convolutional layer of the second spatial angle convolutional layers in the first enhancement residual block outputs 64 width picturesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the first enhancement residual block receivingThe output end of the thirteenth convolution layer of the second space angle convolution layer in the first enhanced residual error block outputs 64 width valuesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforming recombination operation from angle dimension to space dimension on all feature maps in the first enhancement residual block, taking all feature maps obtained after the recombination operation as all feature maps output by the output end of the second spatial angle convolution layer in the first enhancement residual block, and recording a set formed by the feature maps as a set
The input of the global mean pooling layer in the channel attention layer in the first enhanced residual block receivesThe output end of the global mean pooling layer in the channel attention layer in the first enhanced residual block outputs 64 feature maps with the width ofAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,1,FGAP,1All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the first enhanced residual block receives FGAP,1The output end of the fourteenth convolution layer in the channel attention layer in the first enhancement residual block outputs 4 widthAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,1(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block receives FDS,1Of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 64 widths ofAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,1(ii) a F is to beUS,1All the characteristic diagrams in (1) andall feature maps in (1) are multiplied element by element, all obtained feature maps are used as all feature maps output by the output end of the channel attention layer in the first enhanced residual block, and a set formed by the feature maps is marked as FCA,1;
F is to beCA,1All feature maps in (1) and (F)LRAll feature maps in (1) are added element by element, and all obtained feature maps are used as the output end of the first enhanced residual blockAll the output feature maps, and the set formed by the feature maps is FEn,1;
The input terminal of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedReceiving F at receiving end of first spatial feature transform layer in second enhanced residual blockEn,1All feature maps in (1), will FEn,1All the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultIn (1)Adding all the feature maps element by element, using the obtained all the feature maps as all the feature maps output by the output end of the first spatial feature conversion layer in the second enhanced residual block, and recording the set formed by the feature maps as a set
An input of a twelfth of the first spatial angle convolutional layers in the second enhanced residual block receivesOf the twelfth convolutional layer of the first spatial angle convolutional layer in the second enhanced residual block outputs 64 width signalsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the second enhancement residual block receivingThe output end of the thirteenth convolutional layer of the first space angle convolutional layer in the second enhanced residual block outputs 64 width valuesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforming an operation of reconstructing all feature maps from an angle dimension to a space dimension, using all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a first space angle convolution layer in a second enhanced residual block, and recording a set formed by the feature maps as a set
An input of a tenth convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the tenth convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the eleventh convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width mapsAnd has a height ofIs characterized by comprising a characteristic diagram of (A),the set of all the output feature maps is expressed asReceiving end of second spatial feature transform layer in second enhanced residual blockAll the characteristic diagrams in (1) willAll the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultThe obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the second enhanced residual block, and the set formed by the feature maps is recorded as a set
An input of a twelfth of the second spatial angle convolutional layers in the second enhanced residual block receivesOf the twelfth convolutional layer of the second spatial angle convolutional layers in the second enhancement residual block outputs 64 width picturesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the second enhancement residual block receivingThe output end of the thirteenth convolution layer of the second space angle convolution layer in the second enhanced residual block outputs 64 width valuesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforming an operation of reconstructing all feature maps from an angle dimension to a space dimension, using all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a second space angle convolution layer in a second enhanced residual block, and recording a set formed by the feature maps as a set
The input of the global mean pooling layer in the channel attention layer in the second enhanced residual block receivesThe output end of the global mean pooling layer in the channel attention layer in the second enhanced residual block outputs 64 width picturesAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,2,FGAP,2All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the second enhanced residual block receives FGAP,2The output end of the fourteenth convolution layer in the channel attention layer in the second enhanced residual block outputs 4 widthAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,2(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block receives FDS,2Of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 64 widths ofAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,2(ii) a F is to beUS,2All the characteristic diagrams in (1) andthe obtained all feature maps are used as all feature maps output by the output end of the channel attention layer in the second enhanced residual block, and the set formed by the feature maps is marked as FCA,2;
F is to beCA,2In (1)All feature maps and FEn,1All the feature maps in the first enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the second enhancement residual block, and the set formed by the feature maps is FEn,2;
An input of a tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedReceiving F at receiving end of first spatial feature transform layer in third enhanced residual blockEn,2All feature maps in (1), will FEn,2All the characteristic diagrams in (1) andall feature maps in the method are multiplied element by element, and then the multiplication result is multipliedAndall feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the first spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set
An input of a twelfth of the first spatial angle convolutional layers in the third enhanced residual block receivesOf the twelfth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 width signalsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the third enhancement residual block receivingThe output end of the thirteenth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 widths as the result of the recombination operation of all the feature maps in (1)And has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairAll feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the first space angle convolution layer in the third enhancement residual block, and a set formed by the feature maps is recorded as a set
An input of a tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 widthAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedReceiving end of second spatial feature transform layer in third enhanced residual blockAll the characteristic diagrams in (1) willAll the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultAll feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set
An input of a twelfth of the second spatial angle convolutional layers in the third enhanced residual block receivesOf the twelfth convolutional layer of the second spatial angle convolutional layer in the third enhanced residual block outputs 64 width picturesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the third enhancement residual block receivingThe output end of the thirteenth convolution layer of the second space angle convolution layer in the third enhanced residual block outputs 64 width valuesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairAll feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the second space angle convolution layer in the third enhancement residual block, and a set formed by the feature maps is recorded as a set
The input of the global mean pooling layer in the channel attention layer in the third enhanced residual block receivesThe output end of the global mean pooling layer in the channel attention layer in the third enhanced residual block outputs 64 width imagesAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,3,FGAP,3All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the third enhanced residual block receives FGAP,3The output end of the fourteenth convolution layer in the channel attention layer in the third enhanced residual block outputs 4 widthAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,3(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block receives FDS,3Of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 64 widths ofAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,3(ii) a F is to beUS,3All the characteristic diagrams in (1) andall feature maps in (1) are multiplied element by element, and all obtained feature maps are used as output ends of a channel attention layer in a third enhanced residual blockAll the feature maps are output, and a set of these feature maps is denoted as FCA,3;
F is to beCA,3All feature maps in (1) and (F)En,2All the feature maps in the third enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third enhancement residual block, and the set formed by the feature maps is FEn,3;
In the above, the sizes of convolution kernels of the tenth convolution layer and the eleventh convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is all 64, and no activation function is adopted, the sizes of convolution kernels of the twelfth convolution layer and the thirteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is 64, the adopted activation functions are all "ReLU", the sizes of convolution kernels of the fourteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are 1 × 1, the convolution step lengths are 1, the number of input channels is 64, the number of output channels is 4, and the adopted activation function is "ReLU", the size of the convolution kernel of the fifteenth convolution layer in each of the first, second, and third enhanced residual blocks is 1 × 1, the convolution step is 1, the number of input channels is 4, the number of output channels is 64, and the employed activation function is "Sigmoid".
Compared with the prior art, the invention has the advantages that:
1) the method of the invention considers that the traditional 2D camera can collect abundant space information which can be used as compensation information for reconstructing the light field image space resolution, so that the light field image and the 2D high resolution image are simultaneously used, and on the basis, an end-to-end convolution neural network is constructed to fully utilize the information of the two to reconstruct the high space resolution light field image, recover detailed texture information and keep a parallax structure of a reconstruction result.
2) In order to establish the relation between the light field image and the 2D high-resolution image, the method constructs an aperture-level feature registration module to explore the correlation between the light field image and the 2D high-resolution image in a high-dimensional feature space, and further accurately registers the feature information of the 2D high-resolution image under the light field image; in addition, the method utilizes the constructed light field characteristic enhancement module to carry out multi-level fusion on the high-resolution characteristics obtained by registration and the shallow light field characteristics extracted from the low-spatial resolution light field image so as to effectively generate the high-spatial resolution light field characteristics, and further reconstruct the high-spatial resolution light field characteristics into the high-spatial resolution light field image.
3) In order to improve flexibility and practicability, the method adopts a pyramid network reconstruction mode, and the super-resolution results of specific scales are reconstructed at different pyramid levels so as to gradually improve the spatial resolution of the light field image and recover textures and details, so that multi-scale results (such as 2 x, 4 x and 8 x) can be reconstructed in one-time forward inference; in addition, the method adopts a weight sharing strategy under different pyramid levels so as to effectively reduce the parameter quantity of the constructed pyramid network and reduce the training burden.
Drawings
FIG. 1 is a block diagram of the overall implementation of the method of the present invention;
FIG. 2 is a schematic diagram of the structure of a convolutional neural network, namely a spatial super-resolution network, constructed by the method of the present invention;
FIG. 3a is a schematic diagram of the structure of a light field feature enhancement module in a convolutional neural network, i.e., a spatial super-resolution network, constructed by the method of the present invention;
FIG. 3b is a schematic diagram of the composition structure of the first spatial feature transform layer and the second spatial feature transform layer in the light field feature enhancement module in the convolutional neural network, i.e., the spatial super-resolution network, constructed by the method of the present invention;
FIG. 3c is a schematic diagram of the composition structure of the first spatial angle convolutional layer and the second spatial angle convolutional layer in the light field feature enhancement module in the convolutional neural network, i.e., the spatial super-resolution network, constructed by the method of the present invention;
FIG. 3d is a schematic diagram of the structure of the channel attention layer in the light field feature enhancement module in the convolutional neural network, i.e., the spatial super-resolution network, constructed by the method of the present invention;
FIG. 4 is a schematic diagram illustrating a pyramid network reconstruction method established by the method of the present invention;
FIG. 5a is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by a bicubic interpolation method, where a sub-aperture image under a central coordinate is taken for display;
FIG. 5b is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database using Haris et al, where a sub-aperture image at a central coordinate is taken for display;
FIG. 5c is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by the method of Lai et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 5d is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by a method of Yeung et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 5e is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by the method of Wang et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 5f is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by Jin et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 5g is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by Boominathan et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 5h is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database using the method of the present invention, where a sub-aperture image under a central coordinate is taken for display;
FIG. 5i is a label high spatial resolution light field image corresponding to a low spatial resolution light field image in a tested EPFL light field image database, where a sub-aperture image under a central coordinate is taken for display;
FIG. 6a is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by a bicubic interpolation method, wherein a sub-aperture image under a central coordinate is taken for display;
FIG. 6b is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by Haris et al, where a sub-aperture image at a central coordinate is taken for display;
FIG. 6c is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by the method of Lai et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 6d is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by a method of Yeung et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 6e is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by the method of Wang et al, where a sub-aperture image in a central coordinate is taken for display;
FIG. 6f is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by Jin et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 6g is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by Boominathan et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 6h is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database using the method of the present invention, where a sub-aperture image under a central coordinate is taken for display;
fig. 6i is a label high spatial resolution light field image corresponding to a low spatial resolution light field image in a tested STFLytro light field image database.
Detailed Description
The invention is described in further detail below with reference to the accompanying examples.
With the development of immersive media and technology, users are increasingly inclined to view visual content such as interactive and immersive images/videos. However, the conventional 2D imaging method can only collect the intensity information of the light in the scene, and cannot provide the depth information of the scene. In contrast, 3D imaging techniques can acquire more scene information, however, they contain limited depth information and are typically used for stereoscopic displays. As a new imaging technology, light field imaging is receiving wide attention, and can simultaneously acquire intensity and direction information of light in a scene in a single shooting, thereby more effectively recording the real world. Meanwhile, some optical instruments and devices based on light field imaging have been developed to promote the application and development of light field technology. Limited by the size of the imaging sensor, the 4D light field images acquired with a light field camera suffer from the problem of spatial and angular resolution being mutually compromised. In brief, while providing high angular resolution, the 4D light field image inevitably suffers from low spatial resolution, which seriously affects the practical applications of the 4D light field image, such as refocusing, depth estimation, etc., for which, the present invention proposes a light field image spatial super-resolution reconstruction method,
the method comprises the steps of acquiring a 2D high-resolution image while capturing a light field image through heterogeneous imaging, and further using the captured 2D high-resolution image as supplementary information to help enhance the spatial resolution of the light field image, wherein a spatial super-resolution network is constructed and mainly comprises an encoder, an aperture level feature registration module, a light field feature enhancement module, a decoder and the like; firstly, respectively extracting multi-scale features from an up-sampled low-spatial-resolution light field image, a blurred 2D high-resolution image and the 2D high-resolution image by using an encoder; then, learning the correspondence between the 2D high-resolution features and the low-resolution light field features through an aperture-level feature registration module so as to register the 2D high-resolution features under each sub-aperture image of the light field image and form registered high-resolution light field features; then, the light field characteristic enhancement module is used for enhancing shallow light field characteristics extracted from an input light field image by utilizing the high-resolution light field characteristics obtained through registration to obtain enhanced high-resolution light field characteristics; finally, reconstructing the enhanced high-resolution light field characteristics into a high-quality high-spatial resolution light field image by using a decoder; in addition, a pyramid network reconstruction architecture is adopted to reconstruct a high spatial resolution light field image of a specific up-sampling scale at each pyramid level, and then multi-scale reconstruction results can be generated simultaneously.
The invention provides a light field image space super-resolution reconstruction method, the overall implementation flow block diagram of which is shown in figure 1, and the method comprises the following steps:
step 1: selecting Num color three-channel low-spatial-resolution light field images with spatial resolution of W multiplied by H and angular resolution of V multiplied by U, corresponding Num color three-channel 2D high-resolution images with resolution of alpha W multiplied by alpha H, and corresponding Num color three-channel reference high-spatial-resolution light field images with spatial resolution of alpha W multiplied by alpha H and angular resolution of V multiplied by U; where Num > 1, Num in this embodiment is 200, W × H in this embodiment is 75 × 50, V × U is 5 × 5, α represents a spatial resolution improvement multiple, and a is greater than 1, and in this embodiment, α is 8.
Step 2: constructing a convolutional neural network as a spatial super-resolution network: as shown in fig. 2, the spatial super-resolution network includes an encoder for extracting multi-scale features, an aperture level feature registration module for registering light field features and 2D high resolution features, a shallow feature extraction layer for extracting shallow features from a low spatial resolution light field image, a light field feature enhancement module for fusing light field features and 2D high resolution features, a spatial attention block for mitigating registration errors in coarse-scale features, and a decoder for reconstructing potential features into a light field image.
For the encoder, the encoder is composed of a first convolution layer, a second convolution layer, a first residual block and a second residual block which are connected in sequence, wherein the input end of the first convolution layer receives three inputs in parallel, and the three inputs are respectively a single-channel image L of a low-spatial-resolution light field image with spatial resolution of W × H and angular resolution of V × ULRThe width of the image reconstruction obtained after the spatial resolution up-sampling is alphasW x V and height of alphasH × U subaperture image array, which is denoted asA width of alphasW and a height of alphasThe single-channel image of the blurred 2D high-resolution image of H is described asAnd a width of alphasW and a height of alphasSingle channel image of H2D high resolution image, denoted as IHRThe output end of the first convolution layer is directed toOutput 64 frames with width alphasW x V and height of alphasH × U signature graph, will be directed toThe set of all the output feature maps is denoted asOutput terminal of the first winding layer is aimed atOutput 64 frames with width alphasW and a height of alphasH characteristic diagram, will be directed toThe set of all the output feature maps is denoted asOutput terminal of the first convolution layer is directed to IHROutput 64 frames with width alphasW and a height of alphasH signature of H will be directed to IHRThe set of all the output feature maps is denoted as YHR,0(ii) a The input terminal of the second convolutional layer receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All feature maps and Y in (1)HR,0All feature maps in (1), the output of the second convolutional layer being directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal of the second convolution layer is aimed atOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asThe output end of the second convolution layer is directed to YHR,0Output 64 frames with width ofAnd has a height ofWill be directed to YHR,0The set of all the output feature maps is denoted as YHR,1(ii) a The input terminal of the first residual block receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All feature maps and Y in (1)HR,1The output of the first residual block is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput of the first residual block is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput of the first residual block for YHR,1Output 64 frames with width ofAnd has a height ofWill be directed to YHR,1The set of all the output feature maps is denoted as YHR,2(ii) a The input terminal of the second residual block receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All feature maps and Y in (1)HR,2Of the second residual block, the output of the second residual block being directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput pair of second residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput of the second residual block for YHR,2Output 64 frames with width ofAnd has a height ofWill be directed to YHR,2The set of all the output feature maps is denoted as YHR,3(ii) a Wherein,to pass throughSingle-channel image L of low spatial resolution light field image with spatial resolution W × H and angular resolution V × ULRThe width of image recombination obtained after the existing bicubic interpolation up-sampling is alphasW x V and height of alphasAn array of H U sub-aperture images,to pass through the pair IHRFirstly carrying out bicubic interpolation downsampling and then carrying out bicubic interpolation upsampling to obtain alphasRepresenting a spatial resolution sampling factor, in this embodiment alphasValue of 2, alphas 3Alpha, the up-sampling factor of the up-sampling of the bicubic interpolation and the down-sampling factor of the down-sampling of the bicubic interpolation both take the value of alphasThe convolution kernel of the first convolution layer has a size of 3 × 3, a convolution step of 1, a number of input channels of 1 and a number of output channels of 64, the convolution kernel of the second convolution layer has a size of 3 × 3, a convolution step of 2, a number of input channels of 64 and a number of output channels of 64, and both the first convolution layer and the second convolution layer have an "ReLU" activation function.
For the aperture level feature registration module, the input end of the aperture level feature registration module receives three types of feature maps, wherein the first type isAll characteristic diagrams in (1), the second class isThe third class includes four inputs, respectively YHR,0All feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps in (1), YHR,3All feature maps in (1); in the aperture level feature registration module, first, the image data is processedAll feature maps in (1), YHR,0All feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps and Y in (1)HR,3All characteristic maps in (1) are respectively repeatedMaking V times U times so thatAll feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps and Y in (1)HR,3Becomes the width of all the feature maps inAnd the height becomesI.e. to obtain the dimensions andand matching the size of the feature map in (1) with YHR,0Becomes asW x V and height becomes alphasH × U, i.e. to size andthe dimensions of the feature maps in (1) match; then toAll characteristic figures in (1) andall the characteristic graphs in the method are subjected to the existing block matching, and a width of the characteristic graph is obtained after the block matching is finishedAnd has a height ofIs marked as PCI(ii) a Then according to PCIIs a reaction of YHR,1All the characteristic diagrams in (1) andall feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width ofAnd has a height ofThe obtained set of all the registration feature maps is denoted as FAlign,1(ii) a Also according to PCIIs a reaction of YHR,2All the characteristic diagrams in (1) andall feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width ofAnd has a height ofThe obtained set of all the registration feature maps is denoted as FAlign,2(ii) a According to PCIIs a reaction of YHR,3All the characteristic diagrams in (1) andall feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width ofAnd has a height ofThe obtained set of all the registration feature maps is denoted as FAlign,3(ii) a For P againCIPerforming bicubic interpolation up-sampling to obtain a frame with width alphasW x V and height of alphasH × U coordinate index diagram, notedFinally according toWill YHR,0All the characteristic diagrams in (1) andall feature maps in the image are registered in space position to obtain 64 pieces of width alphasW x V and height of alphasH × U registration feature map, and F represents a set of all the obtained registration feature mapsAlign,0(ii) a Output F of aperture level feature registration moduleAlign,0All characteristic diagrams in (1), FAlign,1All characteristic diagrams in (1), FAlign,2All feature maps and F in (1)Align,3All feature maps in (1); wherein, the precision measurement index for block matching is a texture and structure similarity index, the size of the block for block matching is 3 multiplied by 3, and the up-sampling factor of the bicubic interpolation up-sampling is alphas(ii) a This is because the high-level features more closely describe the similarity of images at the semantic level, while suppressing irrelevant textures, and so onAll characteristic figures in (1) andall feature maps in the image are subjected to block matching to obtain a coordinate index map PCIReflect and make a stand ofCharacteristic diagram of (1) andin the above-mentioned method, the convolution operation does not change the spatial position information of the feature map, PCIAlso reflectsCharacteristic diagram of (1) andthe spatial position registration relationship between the feature maps in (1), anCharacteristic diagram of (1) andthe spatial position registration relation between the characteristic graphs in (1) is obtained after up-sampling by bicubic interpolationReflect and make a stand ofCharacteristic diagram of (1) andthe spatial position registration relationship between the feature maps in (1).
For the shallow feature extraction layer, it is composed of 1 fifth convolution layer, the input end of which receives a single-channel image L of a low spatial resolution light field image with spatial resolution WxH and angular resolution VxULRThe output end of the fifth convolution layer outputs 64 characteristic diagrams with the width of W multiplied by V and the height of H multiplied by U, and the set formed by all the output characteristic diagrams is denoted as FLR(ii) a The convolution kernel of the fifth convolution layer has a size of 3 × 3, a convolution step size of 1, a number of input channels of 1, a number of output channels of 64, and the activation function adopted by the fifth convolution layer is "ReLU".
For the light field feature enhancement module, as shown in fig. 3a, it is composed of a first enhancement residual block, a second enhancement residual block and a third enhancement residual block connected in sequence, where the input end of the first enhancement residual block receives FAlign,1All feature maps and F in (1)LRAll feature maps in (1), at αsW × V is equivalent to 2H × U is equivalent toI.e. FLRThe size and F of the feature map in (1)Align,1The feature maps in (1) have the same size, and the output end of the first enhanced residual block outputs 64 frames with the width ofAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,1(ii) a The input of the second enhanced residual block receives FAlign,2All feature maps and F in (1)En,1Of 64 widths at the output of the second enhanced residual blockAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,2(ii) a The input of the third enhanced residual block receives FAlign,3All feature maps and F in (1)En,2Of 64 width at the output of the third enhanced residual blockAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,3。
For a spatial attention block, which consists of a sixth convolutional layer and a seventh convolutional layer connected in sequence, the input of the sixth convolutional layer receives FAlign,0The output end of the sixth convolutional layer outputs 64 characteristic graphs with the width of alphasW x V and height of alphasH × U space attention feature map, all nulls to be outputThe set of inter-attention feature maps is denoted as FSA1(ii) a Input terminal of seventh convolution layer receiving FSA1The output end of the seventh convolutional layer outputs 64 width alpha of all spatial attention feature maps insW x V and height of alphasH × U spatial attention feature map, and F represents a set of all output spatial attention feature mapsSA2(ii) a F is to beAlign,0All feature maps in (1) and (F)SA2Multiplying all the spatial attention feature maps element by element, and recording the set formed by all the obtained feature maps as FWA,0(ii) a F is to beWA,0As all feature maps output by the output end of the spatial attention block; the sizes of convolution kernels of the sixth convolution layer and the seventh convolution layer are both 3 × 3, convolution step lengths are both 1, the number of input channels is 64, the number of output channels is 64, the activation function adopted by the sixth convolution layer is 'ReLU', and the activation function adopted by the seventh convolution layer is 'Sigmoid'.
For the decoder, the decoder is composed of a third residual block, a fourth residual block, a sub-pixel convolution layer, an eighth convolution layer and a ninth convolution layer which are connected in sequence, wherein the input end of the third residual block receives FEn,3Of 64 widths at the output of the third residual blockAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FDec,1(ii) a The input of the fourth residual block receives FDec,1Of 64 width at the output of the fourth residual blockAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FDec,2(ii) a Sub-pixel volumeInput terminal of the stack receives FDec,2The output end of the sub-pixel convolution layer outputs 256 widths of all the characteristic maps inAnd has a height ofAnd 256 widths are set asAnd has a height ofFurther converting the feature map into 64 pieces with the width alphasW x V and height of alphasH × U feature graph, and F represents a set of all converted feature graphsDec,3(ii) a Input terminal of eighth convolution layer receiving FDec,3All feature maps in (1) and (F)WA,0The result of element-by-element addition of all the feature maps in (1), the output end of the eighth convolutional layer outputs 64 width alphasW x V and height of alphasH × U feature map, and F represents a set of all output feature mapsDec,4(ii) a Input terminal of the ninth convolutional layer receives FDec,4The output end of the ninth convolutional layer outputs a characteristic diagram with a width of alphasW x V and height of alphasH multiplied by U, the single-channel light field image is reconstructed, and the width is alphasW x V and height of alphasReconstruction of H multiplied by U single-channel light field image into alpha-space resolutionsW×αsH and high spatial resolution single-channel light field image with angular resolution of V multiplied by U, which is recorded as LSR(ii) a The convolution kernel of the sub-pixel convolution layer has the size of 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 256, the convolution kernel of the eighth convolution layer has the size of 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 64, the convolution kernel of the ninth convolution layer has the size of 1 multiplied by 1, the convolution step is 1, the number of input channels is 64, the number of output channels is 1, and the sub-pixel convolution layer and the eighth convolution layer adoptThe activation functions used are all "ReLU" and the ninth convolution layer does not use an activation function.
And step 3: performing color space conversion on each low spatial resolution light field image in the training set, the corresponding 2D high resolution image and the corresponding reference high spatial resolution light field image, namely converting the RGB color space into the YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of each low spatial resolution light field image into a sub-aperture image array with the width of W multiplied by V and the height of H multiplied by U for representation; then, a sub-aperture image array recombined with Y-channel images of all the light field images with low spatial resolution in the training set, a corresponding Y-channel image of the 2D high-resolution image and a corresponding Y-channel image of the reference light field image with high spatial resolution form the training set; and then constructing a pyramid network, and training by using a training set, wherein the concrete process is as follows:
step 3_ 1: as shown in fig. 4, the constructed spatial super-resolution networks are copied three times and cascaded, the weight of each spatial super-resolution network is shared, that is, the parameters are all the same, and the overall network formed by the three spatial super-resolution networks is defined as a pyramid network; at each pyramid level, the reconstruction scale of the spatial super-resolution network is set to be equal to αsValues are the same, αsWhen the value is 2, the spatial resolution of the light field image is improved by 2 times, so that the final reconstruction scale can reach 8, namely, alpha is alphas 3=8。
Step 3_ 2: carrying out two times of spatial resolution downsampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after downsampling as a label image; carrying out the same spatial resolution down-sampling twice on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image aiming at a first spatial super-resolution network in the pyramid network; then recombining the sub-aperture image array of the Y-channel images of all the low spatial resolution light field images in the training set and recombining the images obtained by sampling the Y-channel images of all the low spatial resolution light field images in the training set at the spatial resolution for one timeThe sub-aperture image array, all 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid network are input into the first spatial super-resolution network in the constructed pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial-resolution light field image in the training setsReconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same.
Step 3_ 3: carrying out single spatial resolution down-sampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after the down-sampling as a label image; carrying out single same spatial resolution down-sampling on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image for a second spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training setsSub-aperture image array for reconstructing high-spatial-resolution Y-channel light field image recombination in multiple mode, and alpha corresponding to Y-channel images of all low-spatial-resolution light field images in training setsInputting blurred 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the reconstructed image sub-aperture image array, all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network into the second spatial super-resolution network in the constructed pyramid network for training to obtain each image in the training setAlpha corresponding to Y-channel image of low spatial resolution light field images 2Reconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same.
Step 3_ 4: taking a Y-channel image of each reference high-spatial-resolution light field image in the training set as a label image; taking the Y-channel image of each 2D high-resolution image in the training set as a 2D high-resolution Y-channel image for a third spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training sets 2Sub-aperture image array for reconstructing high-spatial-resolution Y-channel light field image recombination in multiple mode, and alpha corresponding to Y-channel images of all low-spatial-resolution light field images in training sets 2Inputting fuzzy 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the multiple reconstructed high-spatial-resolution Y-channel light field images to a third spatial super-resolution network in the pyramid network, and all 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the 2D high-resolution Y-channel images to the third spatial super-resolution network in the pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial-resolution light field image in the training sets 3Reconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same.
Obtaining the optimal weight parameters of all convolution kernels in each spatial super-resolution network in the pyramid network after the training is finished, and obtaining a well-trained spatial super-resolution network model; the network model is implemented at each pyramid levelA specific super-resolution reconstruction scale, so that the multi-scale super-resolution result can be output in one forward inference (namely, alpha)sScale 2 x, 4 x and 8 x when taking value of 2); in addition, by carrying out weight sharing on the spatial super-resolution network under each pyramid level, the network parameter quantity can be effectively reduced, and the training burden is reduced.
And 4, step 4: randomly selecting a low-spatial-resolution light field image with three color channels and a corresponding 2D high-resolution image with three color channels as test images; then, converting the low-spatial-resolution light field image of the three color channels and the corresponding 2D high-resolution image of the three color channels from an RGB color space to a YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of the light field image with low spatial resolution into a sub-aperture image array for representation; inputting blurred 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the Y-channel images of the low-spatial resolution light field images, the Y-channel images of the 2D high-resolution images and the Y-channel images of the 2D high-resolution images into a spatial super-resolution network model, and testing to obtain reconstructed high-spatial resolution Y-channel light field images corresponding to the Y-channel images of the low-spatial resolution light field images; then performing bicubic interpolation up-sampling on the Cb channel image and the Cr channel image of the low-spatial-resolution light field image respectively to obtain a reconstructed high-spatial-resolution Cb channel light field image corresponding to the Cb channel image of the low-spatial-resolution light field image and a reconstructed high-spatial-resolution Cr channel light field image corresponding to the Cr channel image of the low-spatial-resolution light field image; and finally, cascading the obtained reconstructed high-spatial-resolution Y-channel light field image, the reconstructed high-spatial-resolution Cb-channel light field image and the reconstructed high-spatial-resolution Cr-channel light field image on the dimension of a color channel, and converting the cascading result into an RGB color space again to obtain the reconstructed high-spatial-resolution light field image of the color three channels corresponding to the low-spatial-resolution light field image.
In this embodiment, in step 2, the first, second, third and fourth residual blocks have the same structure, and each of them is composed of a third convolutional layer and a fourth convolutional layer connected in sequence, and the input end of the third convolutional layer in the first residual block receives three inputs in parallel, namely, three inputs respectivelyAll the characteristic diagrams in (A),All feature maps and Y in (1)HR,1Of the third convolutional layer in the first residual block, the output end of the third convolutional layer is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal pair of the third convolution layer in the first residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asThe output of the third convolutional layer in the first residual block is for YHR,1Output 64 frames with width ofAnd has a height ofWill be directed to YHR,1The set of all the output feature maps is denoted asThe input terminal of the fourth convolutional layer in the first residual block receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All characteristic figures in (1) andof the fourth convolutional layer in the first residual block, the output of the fourth convolutional layer is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal pair of fourth convolution layer in first residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal pair of fourth convolution layer in first residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asWill be provided withAll the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the first residual block for comparisonAll the output feature maps, the set formed by the feature maps is theWill be provided withAll the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the first residual block for comparisonAll the output feature maps, the set formed by the feature maps is theWill YHR,1All the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as the output end of the first residual block and aim at YHR,1All the output feature maps, and the set formed by the feature maps is YHR,2。
The input of the third convolutional layer in the second residual block receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All feature maps and Y in (1)HR,2All feature maps in (1), in the second residual blockOutput terminal of the third convolution layer is aimed atOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput pair of the third convolutional layer in the second residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asThe output of the third convolutional layer in the second residual block is for YHR,2Output 64 frames with width ofAnd has a height ofWill be directed to YHR,2All characteristic maps of the outputThe set of constructs is denoted asThe input terminal of the fourth convolutional layer in the second residual block receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All characteristic figures in (1) andof the fourth convolutional layer in the second residual block, the output of the fourth convolutional layer is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal pair of fourth convolution layer in second residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal pair of fourth convolution layer in second residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asWill be provided withAll the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block for comparisonAll the output feature maps, the set formed by the feature maps is theWill be provided withAll the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block for comparisonAll the output feature maps, the set formed by the feature maps is theWill YHR,2All the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block and aim at YHR,2All the output feature maps, and the set formed by the feature maps is YHR,3。
The input of the third convolutional layer in the third residual block receives FEn,3Of 64 width at the output of the third convolutional layer in the third residual blockAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedInput reception of a fourth convolutional layer in a third residual blockOf 64 width at the output of the fourth convolutional layer in the third residual blockAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedF is to beEn,3All the characteristic diagrams in (1) andall the feature maps in the third residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third residual block, and the set formed by the feature maps is FDec,1。
The input of the third convolutional layer in the fourth residual block receives FDec,1The output end of the third convolution layer in the fourth residual block outputs 64 widthAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedInput reception of a fourth convolutional layer in a fourth residual blockOf 64 width at the output of the fourth convolutional layer in the fourth residual blockAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedF is to beDec,1All the characteristic diagrams in (1) andall the feature maps in the first residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the fourth residual block, and the set formed by the feature maps is FDec,2。
In the above, the sizes of convolution kernels of the third convolution layer and the fourth convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is 64, the number of output channels is 64, and the activation function adopted by the third convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block is "ReLU" and the activation function adopted by the fourth convolution layer is not adopted.
In this embodiment, in step 2, as shown in fig. 3a, 3b, 3c and 3d, the first enhancement residual block, the second enhancement residual block and the third enhancement residual block have the same structure, and each of them is composed of a first spatial characteristic transformation layer, a first spatial angle convolution layer, a second spatial characteristic transformation layer, a second spatial angle convolution layer and a channel attention layer which are connected in sequence, the first spatial characteristic transformation layer and the second spatial characteristic transformation layer have the same structure, and each of them is composed of a tenth convolution layer and an eleventh convolution layer which are parallel, the first spatial angle convolution layer and the second spatial angle convolution layer have the same structure, and each of them is composed of a twelfth convolution layer and a thirteenth convolution layer which are connected in sequence, and the channel attention layer is composed of a global mean value pooling layer, a fourteenth convolution layer and a fifteenth convolution layer which are connected in sequence.
An input of a tenth convolutional layer in the first spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the tenth convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedThe input of the first spatial feature transform layer in the first enhanced residual block receives FLRAll feature maps in (1), will FLRAll the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultAll feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the first spatial feature conversion layer in the first enhanced residual block, and a set formed by the feature maps is recorded as a set
An input of a twelfth of the first spatial angle convolutional layers in the first enhanced residual block receivesOf the first spatial angle convolutional layer in the first enhancement residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer outputs 64 widthsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a re-assembly operation from a spatial dimension to an angular dimension (the re-assembly operation is a conventional processing means of light field images, the re-assembly operation only changes the arrangement order of each feature value in the feature map, and does not change the size of the feature value), and an input end of a thirteenth convolutional layer in the first spatial angle convolutional layer in the first enhanced residual block receivesThe output end of the thirteenth convolutional layer of the first space angle convolutional layer in the first enhancement residual block outputs 64 widths as the result of the reorganization operation of all the feature maps in (1)And has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairAll the characteristics ofThe maps are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the first space angle convolution layer in the first enhanced residual block, and a set formed by the feature maps is recorded as a set
The input terminal of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedThe input of the second spatial feature transform layer in the first enhanced residual block receivesAll the characteristic diagrams in (1) willAll the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultThe obtained feature maps are used as all feature maps output by the output end of the second spatial feature transform layer in the first enhanced residual block, and the set formed by the feature maps is recorded as a set
An input of a twelfth of the second spatial angle convolutional layers in the first enhanced residual block receivesOf the twelfth convolutional layer of the second spatial angle convolutional layers in the first enhancement residual block outputs 64 width picturesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the first enhancement residual block receivingThe output end of the thirteenth convolution layer of the second space angle convolution layer in the first enhanced residual error block outputs 64 width valuesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforming recombination operation from angle dimension to space dimension on all feature maps in the first enhancement residual block, taking all feature maps obtained after the recombination operation as all feature maps output by the output end of the second spatial angle convolution layer in the first enhancement residual block, and recording a set formed by the feature maps as a set
The input of the global mean pooling layer in the channel attention layer in the first enhanced residual block receivesThe output end of the global mean pooling layer in the channel attention layer in the first enhanced residual block outputs 64 feature maps with the width ofAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,1,FGAP,1In each characteristic diagram ofAll the eigenvalues of (1) are the same (the global mean pooling layer is to calculate the global mean value for each eigenvalue received at the input end independently, and then convert one eigenvalue into a single eigenvalue, and then copy the obtained eigenvalue to restore the space size, i.e. copy the single eigenvalueMultiple, get a width ofAnd has a height ofCharacteristic map of (1); the input of the fourteenth convolutional layer in the channel attention layer in the first enhanced residual block receives FGAP,1The output end of the fourteenth convolution layer in the channel attention layer in the first enhancement residual block outputs 4 widthAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,1(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block receives FDS,1Of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 64 widths ofAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,1(ii) a F is to beUS,1All the characteristic diagrams in (1) andall feature maps in (1) are multiplied element by element, all obtained feature maps are used as all feature maps output by the output end of the channel attention layer in the first enhanced residual block, and a set formed by the feature maps is marked as FCA,1。
F is to beCA,1All feature maps in (1) and (F)LRAll the feature maps in the first enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the first enhancement residual block, and the set formed by the feature maps is FEn,1。
The input terminal of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedFirst spatial feature transform layer in second enhanced residual blockReceiving end of FEn,1All feature maps in (1), will FEn,1All the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultThe obtained feature maps are used as all feature maps output by the output end of the first spatial feature transform layer in the second enhanced residual block, and the set formed by the feature maps is recorded as a set
An input of a twelfth of the first spatial angle convolutional layers in the second enhanced residual block receivesOf the twelfth convolutional layer of the first spatial angle convolutional layer in the second enhanced residual block outputs 64 width signalsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the second enhancement residual block receivingThe output end of the thirteenth convolutional layer of the first space angle convolutional layer in the second enhanced residual block outputs 64 width valuesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforming an operation of reconstructing all feature maps from an angle dimension to a space dimension, using all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a first space angle convolution layer in a second enhanced residual block, and recording a set formed by the feature maps as a set
An input of a tenth convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the tenth convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the eleventh convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedReceiving end of second spatial feature transform layer in second enhanced residual blockAll the characteristic diagrams in (1) willAll the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultThe obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the second enhanced residual block, and the set formed by the feature maps is recorded as a set
An input of a twelfth of the second spatial angle convolutional layers in the second enhanced residual block receivesOf the second spatial angle convolutional layer in the second enhancement residual block, and an output terminal of a twelfth convolutional layer of the second spatial angle convolutional layerOut of 64 widthsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the second enhancement residual block receivingThe output end of the thirteenth convolution layer of the second space angle convolution layer in the second enhanced residual block outputs 64 width valuesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforming an operation of reconstructing all feature maps from an angle dimension to a space dimension, using all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a second space angle convolution layer in a second enhanced residual block, and recording a set formed by the feature maps as a set
The input of the global mean pooling layer in the channel attention layer in the second enhanced residual block receivesThe output end of the global mean pooling layer in the channel attention layer in the second enhanced residual block outputs 64 width picturesAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,2,FGAP,2All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the second enhanced residual block receives FGAP,2The output end of the fourteenth convolution layer in the channel attention layer in the second enhanced residual block outputs 4 widthAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,2(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block receives FDS,2Of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 64 widths ofAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,2(ii) a F is to beUS,2All the characteristic diagrams in (1) andthe obtained all feature maps are used as all feature maps output by the output end of the channel attention layer in the second enhanced residual block, and the set formed by the feature maps is marked as FCA,2。
F is to beCA,2All feature maps in (1) and (F)En,1All the feature maps in the first enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the second enhancement residual block, and the set formed by the feature maps is FEn,2。
An input of a tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedReceiving F at receiving end of first spatial feature transform layer in third enhanced residual blockEn,2All feature maps in (1), will FEn,2All the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultAll feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the first spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set
An input of a twelfth of the first spatial angle convolutional layers in the third enhanced residual block receivesOf the twelfth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 width signalsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforming a recombination operation of converting from a spatial dimension to an angular dimension on all feature maps in the third enhanced residual block, performing a first spatial-angular convolution in the third enhanced residual blockInput reception of a thirteenth of the layersThe output end of the thirteenth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 widths as the result of the recombination operation of all the feature maps in (1)And has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairAll feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the first space angle convolution layer in the third enhancement residual block, and a set formed by the feature maps is recorded as a set
An input of a tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 widthAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedReceiving end of second spatial feature transform layer in third enhanced residual blockAll the characteristic diagrams in (1) willAll the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultAll feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set
An input of a twelfth of the second spatial angle convolutional layers in the third enhanced residual block receivesOf the twelfth convolutional layer of the second spatial angle convolutional layer in the third enhanced residual block outputs 64 width picturesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the third enhancement residual block receivingThe output end of the thirteenth convolution layer of the second space angle convolution layer in the third enhanced residual block outputs 64 width valuesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairAll feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, and all feature maps obtained after the recombination operation are used as all feature maps output by the output end of the second space angle convolution layer in the third enhancement residual blockThe set of these feature maps is described as
The input of the global mean pooling layer in the channel attention layer in the third enhanced residual block receivesThe output end of the global mean pooling layer in the channel attention layer in the third enhanced residual block outputs 64 width imagesAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,3,FGAP,3All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the third enhanced residual block receives FGAP,3The output end of the fourteenth convolution layer in the channel attention layer in the third enhanced residual block outputs 4 widthAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,3(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block receives FDS,3Of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 64 widths ofAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,3(ii) a F is to beUS,3All the characteristic diagrams in (1) andall feature maps in (1) are multiplied element by element, all obtained feature maps are used as all feature maps output by the output end of the channel attention layer in the third enhanced residual block, and a set formed by the feature maps is marked as FCA,3。
F is to beCA,3All feature maps in (1) and (F)En,2All the feature maps in the third enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third enhancement residual block, and the set formed by the feature maps is FEn,3。
In the above, the sizes of convolution kernels of the tenth convolution layer and the eleventh convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is all 64, and no activation function is adopted, the sizes of convolution kernels of the twelfth convolution layer and the thirteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is 64, the adopted activation functions are all "ReLU", the sizes of convolution kernels of the fourteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are 1 × 1, the convolution step lengths are 1, the number of input channels is 64, the number of output channels is 4, and the adopted activation function is "ReLU", the size of the convolution kernel of the fifteenth convolution layer in each of the first, second, and third enhanced residual blocks is 1 × 1, the convolution step is 1, the number of input channels is 4, the number of output channels is 64, and the employed activation function is "Sigmoid".
To further illustrate the feasibility and effectiveness of the method of the present invention, experiments were conducted on the method of the present invention.
The method is realized by adopting a PyTorch deep learning framework. The light field images used for training and testing are from an existing light field image database, which includes real world scenes and synthetic scenes, and these light field image databases are freely available for download over the internet. In order to ensure the reliability and robustness of the test, 200 light field images are randomly selected to form a training image set, and 70 light field images are selected to form a test image set, wherein the light field images in the training image set and the light field images in the test image set are not crossed. The basic information of the light field image database used by the training image set and the testing image set is shown in table 1, wherein the 4 light field image databases of EPFL [1], INRIA [2], STFLytro [6] and Kalantari et al [7] are obtained by shooting with a Lytro light field camera, so that the obtained light field image belongs to narrow baseline light field data; the STFGantry [5] light field image database is obtained by adopting a traditional camera fixed on a portal frame to carry out moving shooting, so that the obtained light field image has a larger baseline range and belongs to wide baseline light field data; the light field images in the HCI new [3] and HCI old [4] light field image databases belong to artificially synthesized light field images and also belong to wide baseline light field data.
TABLE 1 basic information of light field image database used for training and testing image sets
The reference information (or download website) corresponding to the light field image database used by the training image set and the testing image set is as follows:
[1] rerabek M, Edbrohimi T.New light Field Image data set [ C ]//2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).2016 (New lightfield Image Dataset [ C ]// Eighth International Conference on Quality of Multimedia Experience, 2016.)
[2] Pendu M L, Jiang X, Guilleot C.light Field input Propagation Low speed Matrix Completion [ J ]. IEEE Transactions on Image Processing,2018,27(4): 1981. sup. supplement 1993 (Propagation of light Field repair by Low-Rank Matrix Completion, IEEE Image Processing journal, 2018,27(4): 1981. sup. supplement 1993)
[3] Honauer K, Johannsen O, Kondermann D, et al.A Dataset and Evaluation method for Depth Estimation on 4D Light Fields [ C ]// Asian Conference on Computer Vision,2016 (one Dataset for 4D Light field Depth Estimation and Evaluation method [ C ]// Asian Computer Vision Conference, 2016.)
[4] Wanner S, Meister S, B Goldquecke. Datases and Benchmarks for Densely Sampled 4D Light Fields [ C ]// International Symposium on Vision Modeling and Visualization,2013 (data sets for dense sampling 4D Light Fields and reference [ C ]// visual Modeling and Visualization International seminar, 2013.)
[5] Vaish V, Adams a. the (New) Stanford Light Field Archive, Computer Graphics Laboratory, Stanford University,2008. ((New) Stanford Light Field Archive, Computer Graphics Laboratory, Stanford University, 2008.)
[6] Raj A S, Lowney M, Shah R, Wetzstein G.Stanford Lytro Light Field Archive, Available: http:// lightfields.stanford.edu/index.html. (Stanford Lytro lightfield Archive, Available website: http:// lightfields.stanford.edu/index.html.)
[7] Kalantari N K, Wang T C, Ramamotorthi R.Learing-Based View Synthesis For Light Field Cameras [ J ]. ACM Transactions on Graphics,2016,35(6):1-10. (For learning-Based View Synthesis For Light Field Cameras [ J ]. ACM Graphics,2016,35 (6):1-10.)
Respectively recombining the light field images in the training image set and the test image set into a sub-aperture image array; considering that there is vignetting effect in the light field camera (appearing as low visual quality of the boundary sub-aperture image), the angular resolution of the light field image used for training and testing is clipped to 9 × 9, i.e. only the central high quality 9 × 9 view is taken; then, taking a 5 × 5 view of the center from the obtained light field image with the angular resolution of 9 × 9 to form a light field image with the angular resolution of 5 × 5, and performing spatial resolution downsampling on the light field image by using a bicubic interpolation method, wherein the downsampling scale is 8, namely the spatial resolution of the light field image is reduced to 1/8 of the original light field image, so as to obtain a light field image with low spatial resolution; taking the original light field image with the angular resolution of 5 multiplied by 5 as a reference high spatial resolution light field image (namely a label image); then, one sub-aperture image is selected from the initial 9 × 9 views (excluding the central 5 × 5 view) and the resolution is kept unchanged, so as to obtain a 2D high resolution image. Thus, the final training set includes an array of sub-aperture images recombined with 200Y-channel images of low spatial resolution light field images with angular resolution of 5 × 5, corresponding Y-channel images of 200 2D high resolution images, and corresponding Y-channel images of 200 reference high spatial resolution light field images; the final test set comprises a subaperture image array recombined by 70Y-channel images of low spatial resolution light field images with angular resolution of 5 x 5, corresponding Y-channel images of 70 2D high resolution images and corresponding 70 reference high spatial resolution light field images, wherein the 70 reference high spatial resolution light field images are not related to network inference or test and are only used for subsequent subjective visual comparison and objective quality evaluation.
When the constructed spatial super-resolution network is trained, initializing parameters of all convolution kernels by adopting an MSRA initializer; the loss function selects the combination of pixel domain L1 norm loss and gradient loss; training the network by using an ADAM optimizer; firstly, the number of the particles is 10-4Training two parts of an encoder and a decoder in a spatial super-resolution network for a learning rate to be converged to a certain degree, and then setting the learning rate to be 10-4The whole spatial super-resolution network is trained, and the learning rate is attenuated by a scale factor of 0.5 after 25 epochs are trained.
In order to illustrate the performance of the method of the present invention, the method of the present invention is compared with the existing bicubic interpolation method and the existing six image super-resolution reconstruction methods, and the method based on the depth back-projection network proposed by Haris et al, the method based on the depth Laplacian pyramid network proposed by Lai et al, the method based on the space-angle separable convolution proposed by Yeung et al, the method based on the space-angle interaction network proposed by Wang et al, the method based on the two-stage network proposed by Jin et al, and the method based on the hybrid input proposed by Camnathan et al, wherein the method of Haris et al and the method of Lai et al belong to the 2D image super-resolution reconstruction method (which is independently applied to each sub-aperture image of the light field image), the method of Yeung et al, the method of Wang et al and the method of Jin et al belong to the common light field image space super-resolution reconstruction method, the method of Boominathan et al belongs to the field of spatial super-resolution reconstruction methods using hybrid input.
Here, the objective Quality Evaluation Index used includes PSNR (Peak Signal-to-Noise Ratio), SSIM (Structural Similarity Index), and an advanced objective Quality Evaluation Index of a Light-Field Image (see Min X, Zhou J, Zhai G, et al a method for Light Field Reconstruction, Compression, and Display Quality Evaluation [ J ]. IEEE transformations on Image Processing,2020,29: 3790-; the SSIM is used for evaluating the objective quality of the super-resolution reconstruction image from the perspective of visual perception, the value of the SSIM is 0-1, and the higher the value is, the better the image quality is; the objective quality evaluation index of the light field image is used for effectively evaluating the objective quality of the super-resolution reconstruction image by jointly measuring the spatial quality (texture and detail) and the angular quality (parallax structure) of the light field image, and the higher the value of the objective quality evaluation index is, the better the image quality is.
Table 2 shows the comparison between the method of the present invention and the existing bicubic interpolation method and the existing optical field image space super-resolution reconstruction method on the psnr (db) index, table 3 shows the comparison between the method of the present invention and the existing bicubic interpolation method and the existing optical field image space super-resolution reconstruction method on the SSIM index, and table 4 shows the comparison between the method of the present invention and the existing bicubic interpolation method and the existing optical field image space super-resolution reconstruction method on the objective quality evaluation index of the optical field image. As can be seen from the objective data listed in tables 2, 3 and 4, compared with the existing light field image spatial super-resolution reconstruction method (including the 2D image super-resolution reconstruction method), the method of the present invention obtains higher quality scores on the three objective quality evaluation indexes used, and is significantly higher than all comparison methods, which indicates that the method of the present invention can effectively reconstruct the texture and detail information of the light field image, and recover a better parallax structure at the same time; particularly, for the light field image databases with different baseline ranges and scene contents, the method of the invention achieves the best super-resolution reconstruction effect, which shows that the method of the invention can well process narrow baseline and wide baseline light field data and has good robustness to the scene contents.
TABLE 2 comparison of PSNR (dB) index by the method of the present invention with the existing bicubic interpolation method and the existing optical field image spatial super-resolution reconstruction method
TABLE 3 comparison of SSIM index using the method of the present invention with existing bicubic interpolation method and existing light field image spatial super resolution reconstruction method
Table 4 comparison of the method of the present invention with the existing bicubic interpolation method and the existing light field image space super-resolution reconstruction method on the objective quality evaluation index of the light field image
FIG. 5a shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by a bicubic interpolation method, where a sub-aperture image under a central coordinate is taken for display; FIG. 5b shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database using Haris et al, where a sub-aperture image at a central coordinate is taken for display; FIG. 5c shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by the method of Lai et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 5d shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by the method of Yeung et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 5e shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database using the method of Wang et al, where a sub-aperture image at a central coordinate is taken for display; FIG. 5f shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by the method of Jin et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 5g shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by using Boominathan et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 5h shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database using the method of the present invention, where a sub-aperture image under a central coordinate is taken for display; fig. 5i shows the label high spatial resolution light field image corresponding to the low spatial resolution light field image in the EPFL light field image database under test, where the sub-aperture image in the central coordinate is taken for presentation.
FIG. 6a shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database using a bicubic interpolation method, where a sub-aperture image in a central coordinate is taken for display; FIG. 6b shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database using Haris et al, where a sub-aperture image at a central coordinate is taken for display; FIG. 6c shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by the method of Lai et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 6d shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by a method of Yeung et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 6e shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by the method of Wang et al, where a sub-aperture image in a central coordinate is taken for display; FIG. 6f shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by a method of Jin et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 6g shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by using Boominathan et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 6h shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database using the method of the present invention, where a sub-aperture image at a central coordinate is taken for display; fig. 6i shows the label high spatial resolution light field image corresponding to the low spatial resolution light field image in the STFLytro light field image database under test, here shown as a sub-aperture image in central coordinates.
Comparing fig. 5a to 5h with fig. 5i, and comparing fig. 6a to 6h with fig. 6i, respectively, it can be clearly seen that, with the existing light field image spatial super-resolution reconstruction methods, including the 2D image super-resolution reconstruction method, the reconstructed high spatial resolution light field image cannot recover the texture and detail information of the image, as shown in the lower left rectangular frame enlarged region in fig. 5a to 5f, and the lower right rectangular frame enlarged region in fig. 6a to 6 f; using the hybrid input light field image spatial super resolution reconstruction method achieves relatively better results but still contains some blurring artifacts as shown by the lower left rectangular box magnified region in fig. 5g and the lower right rectangular box magnified region in fig. 6 g; in contrast, the high spatial resolution light field image reconstructed by the method of the present invention has clear texture and rich details, and is close to the label high spatial resolution light field image (i.e. fig. 5i and fig. 6i) in subjective visual perception, which indicates that the method of the present invention can effectively recover the texture information of the light field image. In addition, by reconstructing each sub-aperture image with high quality, the method of the invention can well ensure the parallax structure of the finally reconstructed high-spatial-resolution light field image.
The innovation of the method is mainly as follows: firstly, acquiring abundant 2D spatial information while capturing high-dimensional light field data through heterogeneous imaging, namely capturing a light field image and a 2D high-resolution image simultaneously, further effectively improving the spatial resolution of the light field image by utilizing the information of the 2D high-resolution image, and recovering corresponding textures and details; secondly, in order to establish and explore the relation between the light field image and the 2D high-resolution image, the method respectively constructs an aperture-level feature registration module and a light field feature enhancement module, wherein the aperture-level feature registration module can accurately register 2D high-resolution information and 4D light field image information, and the light field feature enhancement module can consistently enhance visual information in light field features by using high-resolution feature information obtained by registration on the basis to obtain enhanced high-resolution light field features; and thirdly, a flexible pyramid reconstruction mode is adopted, namely the spatial resolution of the light field image is gradually improved and an accurate parallax structure is recovered by a coarse-to-fine reconstruction strategy, and then a multi-scale super-resolution result can be reconstructed in one-time forward inference. In addition, to reduce the number of parameters and training burden of the pyramid network, weight sharing is performed at each pyramid level.
Claims (3)
1. A light field image space super-resolution reconstruction method is characterized by comprising the following steps:
step 1: selecting Num color three-channel low-spatial-resolution light field images with spatial resolution of W multiplied by H and angular resolution of V multiplied by U, corresponding Num color three-channel 2D high-resolution images with resolution of alpha W multiplied by alpha H, and corresponding Num color three-channel reference high-spatial-resolution light field images with spatial resolution of alpha W multiplied by alpha H and angular resolution of V multiplied by U; wherein Num is more than 1, alpha represents the spatial resolution improvement multiple, and the value of alpha is more than 1;
step 2: constructing a convolutional neural network as a spatial super-resolution network: the spatial super-resolution network comprises an encoder for extracting multi-scale features, an aperture level feature registration module for registering light field features and 2D high-resolution features, a shallow layer feature extraction layer for extracting shallow layer features from a low spatial resolution light field image, a light field feature enhancement module for fusing the light field features and the 2D high-resolution features, a spatial attention block for relieving registration errors in the coarse-scale features, and a decoder for reconstructing potential features into the light field image;
for the encoder, the encoder is composed of a first convolution layer, a second convolution layer, a first residual block and a second residual block which are connected in sequence, wherein the input end of the first convolution layer receives three inputs in parallel, and each input is a frame with spatial resolution of W × H and angle divisionSingle-channel image L of low-spatial-resolution light field image with resolution V multiplied by ULRThe width of the image reconstruction obtained after the spatial resolution up-sampling is alphasW x V and height of alphasH × U subaperture image array, which is denoted asA width of alphasW and a height of alphasThe single-channel image of the blurred 2D high-resolution image of H is described asAnd a width of alphasW and a height of alphasSingle channel image of H2D high resolution image, denoted as IHRThe output end of the first convolution layer is directed toOutput 64 frames with width alphasW x V and height of alphasH × U signature graph, will be directed toThe set of all the output feature maps is denoted asOutput terminal of the first winding layer is aimed atOutput 64 frames with width alphasW and a height of alphasH characteristic diagram, will be directed toThe set of all the output feature maps is denoted asOutput terminal of the first convolution layer is directed to IHROutput 64 frames with width alphasW and a height of alphasH signature of H will be directed to IHRThe set of all the output feature maps is denoted as YHR,0(ii) a The input terminal of the second convolutional layer receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All feature maps and Y in (1)HR,0All feature maps in (1), the output of the second convolutional layer being directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal of the second convolution layer is aimed atOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asThe output end of the second convolution layer is directed to YHR,0Output 64 frames with width ofAnd has a height ofWill be directed to YHRAnd the set of all the characteristic diagrams output by 0 is marked as YHR,1(ii) a The input terminal of the first residual block receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All feature maps and Y in (1)HR,1The output of the first residual block is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput of the first residual block is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput of the first residual block for YHR,1Output 64 frames with width ofAnd has a height ofWill be directed to YHR,1The set of all the output feature maps is denoted as YHR,2(ii) a The input terminal of the second residual block receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All feature maps and Y in (1)HR,2Of the second residual block, the output of the second residual block being directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput pair of second residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput of the second residual block for YHR,2Output 64 frames with width ofAnd has a height ofWill be directed to YHR,2The set of all the output feature maps is denoted as YHR,3(ii) a Wherein,is a single-channel image L of a low spatial resolution light-field image with spatial resolution W × H and angular resolution V × ULRThe width of the image recombination obtained after the bicubic interpolation up-sampling is alphasW x V and height of alphasAn array of H U sub-aperture images,to pass through the pair IHRFirstly carrying out bicubic interpolation downsampling and then carrying out bicubic interpolation upsampling to obtain alphasRepresenting a spatial resolution sampling factor, alphas 3Alpha, the up-sampling factor of the up-sampling of the bicubic interpolation and the down-sampling factor of the down-sampling of the bicubic interpolation both take the value of alphasThe size of the convolution kernel of the first convolution layer is 3 × 3, the convolution step is 1, the number of input channels is 1, the number of output channels is 64, the size of the convolution kernel of the second convolution layer is 3 × 3, the convolution step is 2, the number of input channels is 64, the number of output channels is 64, and the activation functions adopted by the first convolution layer and the second convolution layer are both 'ReLU';
for the aperture level feature registration module, the input end of the aperture level feature registration module receives three types of feature maps, wherein the first type isAll characteristic diagrams in (1), the second class isThe third class includes four inputs, respectively YHR,0All feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps in (1), YHR,3All feature maps in (1); in the aperture level feature registration module, first, the image data is processedAll feature maps in (1), YHR,0All feature maps in (1), YHR,1All feature maps in (1), YHR,2All ofFeature map and YHR,3All feature maps in (1) are each replicated by a factor of V × U, so thatAll feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps and Y in (1)HR,3Becomes the width of all the feature maps inAnd the height becomesI.e. to obtain the dimensions andand matching the size of the feature map in (1) with YHR,0Becomes asW x V and height becomes alphasH × U, i.e. to size andthe dimensions of the feature maps in (1) match; then toAll characteristic figures in (1) andall the characteristic diagrams in the method are subjected to block matching, and a width of the characteristic diagram is obtained after the block matching is finishedAnd has a height ofIs marked as PCI(ii) a Then according to PCIIs a reaction of YHR,1All the characteristic diagrams in (1) andall feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width ofAnd has a height ofThe obtained set of all the registration feature maps is denoted as FAlign,1(ii) a Also according to PCIIs a reaction of YHR,2All the characteristic diagrams in (1) andall feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width ofAnd has a height ofThe obtained set of all the registration feature maps is denoted as FAlign,2(ii) a According to PCIIs a reaction of YHR,3All the characteristic diagrams in (1) andall feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width ofAnd has a height ofThe obtained set of all the registration feature maps is denoted as FAlign,3(ii) a For P againCIPerforming bicubic interpolation up-sampling to obtain a frame with width alphasW is multiplied by V andheight of alphasH × U coordinate index diagram, notedFinally according toWill YHR,0All the characteristic diagrams in (1) andall feature maps in the image are registered in space position to obtain 64 pieces of width alphasW x V and height of alphasH × U registration feature map, and F represents a set of all the obtained registration feature mapsAlign,0(ii) a Output F of aperture level feature registration moduleAlign,0All characteristic diagrams in (1), FAlign,1All characteristic diagrams in (1), FAlign,2All feature maps and F in (1)Align,3All feature maps in (1); wherein, the precision measurement index for block matching is a texture and structure similarity index, the size of the block for block matching is 3 multiplied by 3, and the up-sampling factor of the bicubic interpolation up-sampling is alphas;
For the shallow feature extraction layer, it is composed of 1 fifth convolution layer, the input end of which receives a single-channel image L of a low spatial resolution light field image with spatial resolution WxH and angular resolution VxULRThe output end of the fifth convolution layer outputs 64 characteristic diagrams with the width of W multiplied by V and the height of H multiplied by U, and the set formed by all the output characteristic diagrams is denoted as FLR(ii) a The convolution kernel of the fifth convolution layer has a size of 3 × 3, a convolution step size of 1, a number of input channels of 1, a number of output channels of 64, and the activation function adopted by the fifth convolution layer is "ReLU";
for the light field characteristic enhancement module, the light field characteristic enhancement module consists of a first enhancement residual block, a second enhancement residual block and a third enhancement residual block which are connected in sequence, wherein the input end of the first enhancement residual block receives FAlign,1All feature maps and F in (1)LROf 64 width at the output of the first enhancement residual blockAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,1(ii) a The input of the second enhanced residual block receives FAlign,2All feature maps and F in (1)En,1Of 64 widths at the output of the second enhanced residual blockAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,2(ii) a The input of the third enhanced residual block receives FAlign,3All feature maps and F in (1)En,2Of 64 width at the output of the third enhanced residual blockAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,3;
For a spatial attention block, which consists of a sixth convolutional layer and a seventh convolutional layer connected in sequence, the input of the sixth convolutional layer receives FAlign,0The output end of the sixth convolutional layer outputs 64 characteristic graphs with the width of alphasW x V and height of alphasH × U spatial attention feature map, and F represents a set of all output spatial attention feature mapsSA1(ii) a Input terminal of seventh convolution layer receiving FSA1In (1)All spatial attention feature maps, the output end of the seventh convolutional layer outputs 64 width alphasW x V and height of alphasH × U spatial attention feature map, and F represents a set of all output spatial attention feature mapsSA2(ii) a F is to beAlign,0All feature maps in (1) and (F)SA2Multiplying all the spatial attention feature maps element by element, and recording the set formed by all the obtained feature maps as FWA,0(ii) a F is to beWA,0As all feature maps output by the output end of the spatial attention block; the sizes of convolution kernels of the sixth convolution layer and the seventh convolution layer are both 3 multiplied by 3, convolution step lengths are both 1, the number of input channels is 64, the number of output channels is 64, the activation function adopted by the sixth convolution layer is 'ReLU', and the activation function adopted by the seventh convolution layer is 'Sigmoid';
for the decoder, the decoder is composed of a third residual block, a fourth residual block, a sub-pixel convolution layer, an eighth convolution layer and a ninth convolution layer which are connected in sequence, wherein the input end of the third residual block receives FEn,3Of 64 widths at the output of the third residual blockAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FDec,1(ii) a The input of the fourth residual block receives FDec,1Of 64 width at the output of the fourth residual blockAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FDec,2(ii) a Input terminal of sub-pixel convolution layer receiving FDec,2All characteristic diagrams in (1)The output end of the sub-pixel convolution layer outputs 256 widthsAnd has a height ofAnd 256 widths are set asAnd has a height ofFurther converting the feature map into 64 pieces with the width alphasW x V and height of alphasH × U feature graph, and F represents a set of all converted feature graphsDec,3(ii) a Input terminal of eighth convolution layer receiving FDec,3All feature maps in (1) and (F)WA,0The result of element-by-element addition of all the feature maps in (1), the output end of the eighth convolutional layer outputs 64 width alphasW x V and height of alphasH × U feature map, and F represents a set of all output feature mapsDec,4(ii) a Input terminal of the ninth convolutional layer receives FDec,4The output end of the ninth convolutional layer outputs a characteristic diagram with a width of alphasW x V and height of alphasH multiplied by U, the single-channel light field image is reconstructed, and the width is alphasW x V and height of alphasReconstruction of H multiplied by U single-channel light field image into alpha-space resolutionsW×αsH and high spatial resolution single-channel light field image with angular resolution of V multiplied by U, which is recorded as LSR(ii) a The convolution kernel of the sub-pixel convolution layer has the size of 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 256, the convolution kernel of the eighth convolution layer has the size of 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 64, the convolution kernel of the ninth convolution layer has the size of 1 multiplied by 1, the convolution step is 1, the number of input channels is 64, the number of output channels is 1, and excitation adopted by the sub-pixel convolution layer and the eighth convolution layerThe active functions are all 'ReLU', and the ninth convolution layer does not adopt the active function;
and step 3: performing color space conversion on each low spatial resolution light field image in the training set, the corresponding 2D high resolution image and the corresponding reference high spatial resolution light field image, namely converting the RGB color space into the YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of each low spatial resolution light field image into a sub-aperture image array with the width of W multiplied by V and the height of H multiplied by U for representation; then, a sub-aperture image array recombined with Y-channel images of all the light field images with low spatial resolution in the training set, a corresponding Y-channel image of the 2D high-resolution image and a corresponding Y-channel image of the reference light field image with high spatial resolution form the training set; and then constructing a pyramid network, and training by using a training set, wherein the concrete process is as follows:
step 3_ 1: copying the constructed spatial super-resolution network three times, cascading, sharing the weight of each spatial super-resolution network, namely, all the parameters are the same, and defining the whole network formed by the three spatial super-resolution networks as a pyramid network; at each pyramid level, the reconstruction scale of the spatial super-resolution network is set to be equal to αsThe values are the same;
step 3_ 2: carrying out two times of spatial resolution downsampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after downsampling as a label image; carrying out the same spatial resolution down-sampling twice on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image aiming at a first spatial super-resolution network in the pyramid network; then recombining the Y-channel images of all the low-spatial-resolution light field images in the training set to obtain a sub-aperture image array, performing primary spatial resolution up-sampling on the Y-channel images of all the low-spatial-resolution light field images in the training set to obtain an image recombined sub-aperture image array, all the 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid network and all the 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid networkFuzzy 2D high-resolution Y-channel images obtained by performing one-time spatial resolution down-sampling and one-time spatial resolution up-sampling on the 2D high-resolution Y-channel images of the network are input into a first spatial super-resolution network in the constructed pyramid network for training, and alpha corresponding to the Y-channel image of each low-spatial resolution light field image in a training set is obtainedsReconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same;
step 3_ 3: carrying out single spatial resolution down-sampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after the down-sampling as a label image; carrying out single same spatial resolution down-sampling on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image for a second spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training setsSub-aperture image array for reconstructing high-spatial-resolution Y-channel light field image recombination in multiple mode, and alpha corresponding to Y-channel images of all low-spatial-resolution light field images in training setsInputting fuzzy 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the reconstructed image subaperture image array, all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network into the second spatial super-resolution network in the constructed pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial resolution light field image in the training sets 2Reconstructing a high-spatial-resolution Y-channel light field image; wherein spatial resolution up-sampling and spatial resolution down-samplingThe sampling modes are bicubic interpolation, and the scales of the spatial resolution up-sampling and the spatial resolution down-sampling are equal to alphasThe values are the same;
step 3_ 4: taking a Y-channel image of each reference high-spatial-resolution light field image in the training set as a label image; taking the Y-channel image of each 2D high-resolution image in the training set as a 2D high-resolution Y-channel image for a third spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training sets 2Sub-aperture image array for reconstructing high-spatial-resolution Y-channel light field image recombination in multiple mode, and alpha corresponding to Y-channel images of all low-spatial-resolution light field images in training sets 2Inputting fuzzy 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the multiple reconstructed high-spatial-resolution Y-channel light field images to a third spatial super-resolution network in the pyramid network, and all 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the 2D high-resolution Y-channel images to the third spatial super-resolution network in the pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial-resolution light field image in the training sets 3Reconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same;
obtaining the optimal weight parameters of all convolution kernels in each spatial super-resolution network in the pyramid network after the training is finished, and obtaining a well-trained spatial super-resolution network model;
and 4, step 4: randomly selecting a low-spatial-resolution light field image with three color channels and a corresponding 2D high-resolution image with three color channels as test images; then, converting the low-spatial-resolution light field image of the three color channels and the corresponding 2D high-resolution image of the three color channels from an RGB color space to a YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of the light field image with low spatial resolution into a sub-aperture image array for representation; inputting blurred 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the Y-channel images of the low-spatial resolution light field images, the Y-channel images of the 2D high-resolution images and the Y-channel images of the 2D high-resolution images into a spatial super-resolution network model, and testing to obtain reconstructed high-spatial resolution Y-channel light field images corresponding to the Y-channel images of the low-spatial resolution light field images; then performing bicubic interpolation up-sampling on the Cb channel image and the Cr channel image of the low-spatial-resolution light field image respectively to obtain a reconstructed high-spatial-resolution Cb channel light field image corresponding to the Cb channel image of the low-spatial-resolution light field image and a reconstructed high-spatial-resolution Cr channel light field image corresponding to the Cr channel image of the low-spatial-resolution light field image; and finally, cascading the obtained reconstructed high-spatial-resolution Y-channel light field image, the reconstructed high-spatial-resolution Cb-channel light field image and the reconstructed high-spatial-resolution Cr-channel light field image on the dimension of a color channel, and converting the cascading result into an RGB color space again to obtain the reconstructed high-spatial-resolution light field image of the color three channels corresponding to the low-spatial-resolution light field image.
2. The method for super-resolution reconstruction of light field image space according to claim 1, wherein in step 2, the first, second, third and fourth residual blocks have the same structure and are composed of sequentially connected third and fourth convolutional layers, and the input end of the third convolutional layer in the first residual block receives three inputs in parallel, namely, three inputsAll the characteristic diagrams in (A),All feature maps and Y in (1)HR,1Of the third convolutional layer in the first residual block, the output end of the third convolutional layer is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal pair of the third convolution layer in the first residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput pin of third convolution layer in first residual blockFor YHR,1Output 64 frames with width ofAnd has a height ofWill be directed to YHR,1The set of all the output feature maps is denoted asThe input terminal of the fourth convolutional layer in the first residual block receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All characteristic figures in (1) andof the fourth convolutional layer in the first residual block, the output of the fourth convolutional layer is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal pair of fourth convolution layer in first residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal pair of fourth convolution layer in first residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asWill be provided withAll the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the first residual block for comparisonAll the output feature maps, the set formed by the feature maps is theWill be provided withAll the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the first residual block for comparisonAll the output feature maps, the set formed by the feature maps is theWill YHR,1All the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as the output end of the first residual block and aim at YHR,1All the output feature maps, and the set formed by the feature maps is YHR,2;
The input of the third convolutional layer in the second residual block receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All feature maps and Y in (1)HR,2Of the third convolutional layer in the second residual block, the output end of the third convolutional layer is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput pair of the third convolutional layer in the second residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asThe output of the third convolutional layer in the second residual block is for YHR,2Output 64 frames with width ofAnd has a height ofWill be directed to YHR,2The set of all the output feature maps is denoted asThe input terminal of the fourth convolutional layer in the second residual block receives three inputs in parallel, respectivelyAll the characteristic diagrams in (A),All characteristic figures in (1) andof the fourth convolutional layer in the second residual block, the output of the fourth convolutional layer is directed toOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal pair of fourth convolution layer in second residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asOutput terminal pair of fourth convolution layer in second residual blockOutput 64 frames with width ofAnd has a height ofWill be directed toThe set of all the output feature maps is denoted asWill be provided withAll the characteristic diagrams in (1) andall feature maps in (1) are element-by-elementPixel addition, using all the obtained feature maps as output end pairs of the second residual error blockAll the output feature maps, the set formed by the feature maps is theWill be provided withAll the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block for comparisonAll the output feature maps, the set formed by the feature maps is theWill YHR,2All the characteristic diagrams in (1) andall feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block and aim at YHR,2All the output feature maps, and the set formed by the feature maps is YHR,3;
The input of the third convolutional layer in the third residual block receives FEn,3Of 64 width at the output of the third convolutional layer in the third residual blockAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedInput reception of a fourth convolutional layer in a third residual blockOf 64 width at the output of the fourth convolutional layer in the third residual blockAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedF is to beEn,3All the characteristic diagrams in (1) andall the feature maps in the third residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third residual block, and the set formed by the feature maps is FDec,1;
The input of the third convolutional layer in the fourth residual block receives FDec,1The output end of the third convolution layer in the fourth residual block outputs 64 widthAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedInput reception of a fourth convolutional layer in a fourth residual blockOf 64 width at the output of the fourth convolutional layer in the fourth residual blockAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedF is to beDec,1All the characteristic diagrams in (1) andall the feature maps in the first residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the fourth residual block, and the set formed by the feature maps is FDec,2;
In the above, the sizes of convolution kernels of the third convolution layer and the fourth convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is 64, the number of output channels is 64, and the activation function adopted by the third convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block is "ReLU" and the activation function adopted by the fourth convolution layer is not adopted.
3. The light field image spatial super-resolution reconstruction method according to claim 1 or 2, it is characterized in that in step 2, the first enhanced residual block, the second enhanced residual block and the third enhanced residual block have the same structure, which consists of a first spatial characteristic transformation layer, a first spatial angle convolution layer, a second spatial characteristic transformation layer, a second spatial angle convolution layer and a channel attention layer which are connected in sequence, wherein the first spatial characteristic transformation layer and the second spatial characteristic transformation layer have the same structure, which are composed of a tenth convolution layer and an eleventh convolution layer in parallel, the first space angle convolution layer and the second space angle convolution layer have the same structure, the channel attention layer consists of a global mean value pooling layer, a fourteenth convolution layer and a fifteenth convolution layer which are connected in sequence;
an input of a tenth convolutional layer in the first spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the tenth convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedThe input of the first spatial feature transform layer in the first enhanced residual block receives FLRAll feature maps in (1), will FLRAll the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultAll feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the first spatial feature conversion layer in the first enhanced residual block, and a set formed by the feature maps is recorded as a set
An input of a twelfth of the first spatial angle convolutional layers in the first enhanced residual block receivesOf the first spatial angle convolutional layer in the first enhancement residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer outputs 64 widthsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairFrom the spatial dimension to the angular dimensionThe input of a thirteenth of the first spatial angle convolutional layers in the first enhanced residual block receivesThe output end of the thirteenth convolutional layer of the first space angle convolutional layer in the first enhancement residual block outputs 64 widths as the result of the reorganization operation of all the feature maps in (1)And has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforming an operation of reconstructing all feature maps from an angle dimension to a space dimension, taking all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a first space angle convolution layer in a first enhanced residual block, and recording a set formed by the feature maps as a set
The input terminal of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedThe input of the second spatial feature transform layer in the first enhanced residual block receivesAll the characteristic diagrams in (1) willAll the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultThe obtained feature maps are used as all feature maps output by the output end of the second spatial feature transform layer in the first enhanced residual block, and the set formed by the feature maps is recorded as a set
An input of a twelfth of the second spatial angle convolutional layers in the first enhanced residual block receivesOf the twelfth convolutional layer of the second spatial angle convolutional layers in the first enhancement residual block outputs 64 width picturesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the first enhancement residual block receivingThe output end of the thirteenth convolution layer of the second space angle convolution layer in the first enhanced residual error block outputs 64 width valuesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforming recombination operation from angle dimension to space dimension on all feature maps in the first enhancement residual block, taking all feature maps obtained after the recombination operation as all feature maps output by the output end of the second spatial angle convolution layer in the first enhancement residual block, and recording a set formed by the feature maps as a set
The input of the global mean pooling layer in the channel attention layer in the first enhanced residual block receivesThe output end of the global mean pooling layer in the channel attention layer in the first enhanced residual block outputs 64 feature maps with the width ofAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,1,FGAP,1All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the first enhanced residual block receives FGAP,1The output end of the fourteenth convolution layer in the channel attention layer in the first enhancement residual block outputs 4 widthAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedFDS,1(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block receives FDS,1Of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 64 widths ofAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,1(ii) a F is to beUS,1All the characteristic diagrams in (1) andall feature maps in (1) are multiplied element by element, all obtained feature maps are used as all feature maps output by the output end of the channel attention layer in the first enhanced residual block, and a set formed by the feature maps is marked as FCA,1;
F is to beCA,1All feature maps in (1) and (F)LRAll the feature maps in the first enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the first enhancement residual block, and the set formed by the feature maps is FEn,1;
The input terminal of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedReceiving F at receiving end of first spatial feature transform layer in second enhanced residual blockEn,1All feature maps in (1), will FEn,1All the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultThe obtained feature maps are used as all feature maps output by the output end of the first spatial feature transform layer in the second enhanced residual block, and the set formed by the feature maps is recorded as a set
An input of a twelfth of the first spatial angle convolutional layers in the second enhanced residual block receivesAll feature maps in (1), first spatial angle volume in second enhancement residual blockThe output end of the twelfth convolution layer of the lamination outputs 64 widthsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the second enhancement residual block receivingThe output end of the thirteenth convolutional layer of the first space angle convolutional layer in the second enhanced residual block outputs 64 width valuesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforming an operation of reconstructing all feature maps from an angle dimension to a space dimension, and outputting all feature maps obtained after the operation of reconstructing as output ends of the first space angle convolution layer in the second enhanced residual blockAll feature maps are referred to as a set of feature maps
An input of a tenth convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the tenth convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the eleventh convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedReceiving end of second spatial feature transform layer in second enhanced residual blockAll the characteristic diagrams in (1) willAll the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultThe obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the second enhanced residual block, and the set formed by the feature maps is recorded as a set
An input of a twelfth of the second spatial angle convolutional layers in the second enhanced residual block receivesOf the twelfth convolutional layer of the second spatial angle convolutional layers in the second enhancement residual block outputs 64 width picturesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a re-composition operation from the spatial dimension to the angular dimension, a thirteenth convolution in the second spatial-angular convolution layer in the second enhancement residual blockInput side reception of layersThe output end of the thirteenth convolution layer of the second space angle convolution layer in the second enhanced residual block outputs 64 width valuesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforming an operation of reconstructing all feature maps from an angle dimension to a space dimension, using all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a second space angle convolution layer in a second enhanced residual block, and recording a set formed by the feature maps as a set
The input of the global mean pooling layer in the channel attention layer in the second enhanced residual block receivesThe output end of the global mean pooling layer in the channel attention layer in the second enhanced residual block outputs 64 width picturesAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,2,FGAP,2All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the second enhanced residual block receives FGAP,2The output end of the fourteenth convolution layer in the channel attention layer in the second enhanced residual block outputs 4 widthAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,2(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block receives FDS,2Of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 64 widths ofAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,2(ii) a F is to beUS,2All the characteristic diagrams in (1) andthe obtained all feature maps are used as all feature maps output by the output end of the channel attention layer in the second enhanced residual block, and the set formed by the feature maps is marked as FCA,2;
F is to beCA,2All feature maps in (1) and (F)En,1All feature maps in (1) are added element by element, and all obtained feature maps are used as all features output by the output end of the second enhanced residual blockThe set of these characteristic maps is FEn,2;
An input of a tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedReceiving F at receiving end of first spatial feature transform layer in third enhanced residual blockEn,2All feature maps in (1), will FEn,2All the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultAll feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the first spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set
An input of a twelfth of the first spatial angle convolutional layers in the third enhanced residual block receivesOf the twelfth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 width signalsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the third enhancement residual block receivingThe output end of the thirteenth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 widths as the result of the recombination operation of all the feature maps in (1)And has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairAll feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the first space angle convolution layer in the third enhancement residual block, and a set formed by the feature maps is recorded as a set
An input of a tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 widthAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedAn input of an eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 width mapsAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedReceiving end of second spatial feature transform layer in third enhanced residual blockAll the characteristic diagrams in (1) willAll the characteristic diagrams in (1) andmultiplying all the characteristic graphs element by element, and comparing the multiplication result with the resultAll feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set
An input of a twelfth of the second spatial angle convolutional layers in the third enhanced residual block receivesOf the twelfth convolutional layer of the second spatial angle convolutional layer in the third enhanced residual block outputs 64 width picturesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairPerforms a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the third enhancement residual block receivingThe output end of the thirteenth convolution layer of the second space angle convolution layer in the third enhanced residual block outputs 64 width valuesAnd has a height ofThe feature map of (1) represents a set of all feature maps outputtedTo pairAll feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the second space angle convolution layer in the third enhancement residual block, and a set formed by the feature maps is recorded as a set
The input of the global mean pooling layer in the channel attention layer in the third enhanced residual block receivesThe output end of the global mean pooling layer in the channel attention layer in the third enhanced residual block outputs 64 width imagesAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,3,FGAP,3All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the third enhanced residual block receives FGAP,3The output end of the fourteenth convolution layer in the channel attention layer in the third enhanced residual block outputs 4 widthAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,3(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block receives FDS,3Of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 64 widths ofAnd has a height ofThe feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,3(ii) a F is to beUS,3All the characteristic diagrams in (1) andall feature maps in (1) are multiplied element by element, all obtained feature maps are used as all feature maps output by the output end of the channel attention layer in the third enhanced residual block, and a set formed by the feature maps is marked as FCA,3;
F is to beCA,3All feature maps in (1) and (F)En,2All the feature maps in the third enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third enhancement residual block, and the set formed by the feature maps is FEn,3;
In the above, the sizes of convolution kernels of the tenth convolution layer and the eleventh convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is all 64, and no activation function is adopted, the sizes of convolution kernels of the twelfth convolution layer and the thirteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is 64, the adopted activation functions are all "ReLU", the sizes of convolution kernels of the fourteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are 1 × 1, the convolution step lengths are 1, the number of input channels is 64, the number of output channels is 4, and the adopted activation function is "ReLU", the size of the convolution kernel of the fifteenth convolution layer in each of the first, second, and third enhanced residual blocks is 1 × 1, the convolution step is 1, the number of input channels is 4, the number of output channels is 64, and the employed activation function is "Sigmoid".
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111405987.1A CN114359041A (en) | 2021-11-24 | 2021-11-24 | Light field image space super-resolution reconstruction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111405987.1A CN114359041A (en) | 2021-11-24 | 2021-11-24 | Light field image space super-resolution reconstruction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114359041A true CN114359041A (en) | 2022-04-15 |
Family
ID=81096214
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111405987.1A Pending CN114359041A (en) | 2021-11-24 | 2021-11-24 | Light field image space super-resolution reconstruction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114359041A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116309067A (en) * | 2023-03-21 | 2023-06-23 | 安徽易刚信息技术有限公司 | Light field image space super-resolution method |
CN117475088A (en) * | 2023-12-25 | 2024-01-30 | 浙江优众新材料科技有限公司 | Light field reconstruction model training method based on polar plane attention and related equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200402205A1 (en) * | 2019-06-18 | 2020-12-24 | Huawei Technologies Co., Ltd. | Real-time video ultra resolution |
CN112381711A (en) * | 2020-10-27 | 2021-02-19 | 深圳大学 | Light field image reconstruction model training and rapid super-resolution reconstruction method |
CN112950475A (en) * | 2021-03-05 | 2021-06-11 | 北京工业大学 | Light field super-resolution reconstruction method based on residual learning and spatial transformation network |
CN113139898A (en) * | 2021-03-24 | 2021-07-20 | 宁波大学 | Light field image super-resolution reconstruction method based on frequency domain analysis and deep learning |
-
2021
- 2021-11-24 CN CN202111405987.1A patent/CN114359041A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200402205A1 (en) * | 2019-06-18 | 2020-12-24 | Huawei Technologies Co., Ltd. | Real-time video ultra resolution |
CN112381711A (en) * | 2020-10-27 | 2021-02-19 | 深圳大学 | Light field image reconstruction model training and rapid super-resolution reconstruction method |
CN112950475A (en) * | 2021-03-05 | 2021-06-11 | 北京工业大学 | Light field super-resolution reconstruction method based on residual learning and spatial transformation network |
CN113139898A (en) * | 2021-03-24 | 2021-07-20 | 宁波大学 | Light field image super-resolution reconstruction method based on frequency domain analysis and deep learning |
Non-Patent Citations (1)
Title |
---|
邓武 等: "《融合全局与局部视角的光场超分辨率重建》", 《计算机应用研究》, vol. 36, no. 5, 31 May 2019 (2019-05-31), pages 1549 - 1559 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116309067A (en) * | 2023-03-21 | 2023-06-23 | 安徽易刚信息技术有限公司 | Light field image space super-resolution method |
CN116309067B (en) * | 2023-03-21 | 2023-09-29 | 安徽易刚信息技术有限公司 | Light field image space super-resolution method |
CN117475088A (en) * | 2023-12-25 | 2024-01-30 | 浙江优众新材料科技有限公司 | Light field reconstruction model training method based on polar plane attention and related equipment |
CN117475088B (en) * | 2023-12-25 | 2024-03-19 | 浙江优众新材料科技有限公司 | Light field reconstruction model training method based on polar plane attention and related equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cai et al. | Mask-guided spectral-wise transformer for efficient hyperspectral image reconstruction | |
CN113139898B (en) | Light field image super-resolution reconstruction method based on frequency domain analysis and deep learning | |
Wu et al. | Light field reconstruction using deep convolutional network on EPI | |
Farrugia et al. | Light field super-resolution using a low-rank prior and deep convolutional neural networks | |
CN110880162A (en) | Snapshot spectrum depth combined imaging method and system based on deep learning | |
CN114359041A (en) | Light field image space super-resolution reconstruction method | |
Guan et al. | Srdgan: learning the noise prior for super resolution with dual generative adversarial networks | |
Jin et al. | Light field super-resolution via attention-guided fusion of hybrid lenses | |
CN114841856A (en) | Image super-pixel reconstruction method of dense connection network based on depth residual channel space attention | |
Zhang et al. | Light field super-resolution using complementary-view feature attention | |
Liu et al. | Learning from EPI-volume-stack for light field image angular super-resolution | |
CN116823602B (en) | Parallax-guided spatial super-resolution reconstruction method for light field image | |
Lu et al. | Low-rank constrained super-resolution for mixed-resolution multiview video | |
CN117114987A (en) | Light field image super-resolution reconstruction method based on sub-pixels and gradient guidance | |
CN117237207A (en) | Ghost-free high dynamic range light field imaging method for dynamic scene | |
CN111696167A (en) | Single image super-resolution reconstruction method guided by self-example learning | |
Neshatavar et al. | ICF-SRSR: Invertible scale-Conditional Function for Self-Supervised Real-world Single Image Super-Resolution | |
CN103226818A (en) | Single-frame image super-resolution reconstruction method based on manifold regularized sparse support regression | |
CN116630152A (en) | Image resolution reconstruction method and device, storage medium and electronic equipment | |
Fang et al. | Light field reconstruction with a hybrid sparse regularization-pseudo 4DCNN framework | |
CN113205005B (en) | Low-illumination low-resolution face image reconstruction method | |
Rohit et al. | A robust face hallucination technique based on adaptive learning method | |
Liu et al. | DCM-CNN: Densely connected multiloss convolutional neural networks for light field view synthesis | |
Chen et al. | Hybrid Domain Learning towards Light Field Spatial Super-Resolution using Heterogeneous Imaging | |
Li et al. | Realistic single-image super-resolution using autoencoding adversarial networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |