CN116957057A

CN116957057A - Multi-view information interaction-based light field image super-resolution network generation method

Info

Publication number: CN116957057A
Application number: CN202310410553.3A
Authority: CN
Inventors: 杨永祺; 周树波
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2023-04-16
Filing date: 2023-04-16
Publication date: 2023-10-27

Abstract

The invention discloses a light field super-resolution network generation method based on multi-view information interaction, belongs to the technical field of image super-resolution reconstruction, and aims to solve the problem of insufficient spatial resolution of light field images. The invention adopts the countermeasure generation network as the whole structure of the network to improve the performance of the network. The decomposition kernel convolution is adopted for the specificity of the light field data to extract the space angle information of the light field image, and meanwhile, the decomposition kernel convolution is combined with the channel attention mechanism to form a channel attention residual error module based on the decomposition kernel. A generator of the proposed network adopts a plurality of decomposition kernel channel attention residual modules connected in series, and an input intensive residual structure is introduced into the part to realize feature extraction. And the discriminant also uses decomposition kernel convolution to extract features and output a final discriminant result. The network can extract richer information in the light field data to realize light field super-resolution reconstruction.

Description

Multi-view information interaction-based light field image super-resolution network generation method

Technical Field

The invention belongs to the technical field of computational imaging, and particularly relates to a light field image super-resolution network generation method based on multi-view information interaction.

Background

Along with the development of the technology in the era, the field of camera imaging is rapidly developed, and when a traditional camera shoots an object, only 2D projection of light rays is recorded, so that a large amount of spatial angle information of a light field is lost, and in order to acquire the missing information, multiple times of shooting of scenes are needed at different positions, so that a multi-view image is obtained. Compared with the traditional camera, the light field camera can obtain multi-view images of a scene by shooting once, can obtain space and angle information of a light field at the same time, can obtain depth information, refocusing view images and the like of the scene after processing, and has higher imaging efficiency than the traditional camera, but the cost of obtaining the angle information of the light field camera is that the space resolution of a single view image is sacrificed, and the lower of the space resolution has no small influence on various data processing, so that the super-resolution technology for the light field image has important significance.

In recent years, due to the development of convolutional neural networks (Convolutional Neural Network, CNN) and the advent of light field data sets, deep learning-based approaches have a good performance in terms of light field reconstruction. In 2016, yoon et al used CNN for light field super-resolution reconstruction for the first time, and the method used two cascaded convolutional neural networks to perform spatial and angular super-resolution reconstruction on light field images respectively. In 2017, farugia et al proposed a dictionary learning-based method to learn the mapping between light field low-resolution and high-resolution images. In the same year, gaochang Wu et al, university of Qinghai, studied a light field super-resolution reconstruction technique based on polar planar images (epipolar plane image, EPI). In 2018, wang et al designed a bi-directional recursive CNN for the structure of the neural network, then super-resolved the horizontal and vertical image stacks with CNN, and finally unified the stacks by superposition generalization to get the complete view image. Zhang et al then used a multi-branched residual network to achieve spatial super-resolution reconstruction of the light field, the inputs of the different branches were sub-aperture images stacked in different directions, respectively, to learn the correlation between the sub-aperture images in different directions, and finally fused the extracted features to reconstruct the light field. In 2020 Chen et al applied the structure of the countermeasure generation network (Generative Adversarial Network, GAN) in light field super-resolution reconstruction, and also proposed EPI loss functions to reduce the gap between the reconstruction result and the real light field.

On the other hand, an improvement method for convolution has been proposed in recent years. In 2019, meng et al proposed a convolution method for four-dimensional data of a light field, namely 4D convolution, and simultaneously extracted spatial angle information of the light field through high-dimensional convolution. In the same year, yeung et al propose separable Spatial-angle convolution (SAS) aiming at the problem of low calculation efficiency of 4D convolution, and the calculation speed of a network is greatly improved while the performance and the 4D convolution are ensured to be equal. In 2022, hu et al further performs dimension reduction analysis on the optical field data based on separable convolution, and put forward the concept of decomposition kernel, and the convolution method can extract information from the angular subspace and the spatial subspace of the optical field as well as from the four EPI subspaces of the optical field as in SAS convolution. In addition to the separable convolution method, wang et al in 2020 have proposed a spatial feature extractor and (spatial featureextractor, SFE) and an angular feature extractor (angular feature extractor, AFE) for extracting spatial information and angular information, respectively, of a light field image by changing the shape of the 2D convolution. In 2022, the authors have proposed horizontal and vertical EPI feature extractors based on SFE and AFE, further improving network performance. It is worth mentioning that the improvement of the light field GAN in the convolution direction is only the LightGAN proposed by Meng et al in 2020 at present, and the method replaces the 2D convolution with the 4D convolution in the GAN network to extract the high-dimensional information of the light field data, so as to realize the light field reconstruction.

In recent years, deep learning has greatly advanced in the field of super-resolution of light field images, but information utilization of light field data is insufficient, and in the process of super-resolution, many methods do not pay attention to the information of the light field data in the channel dimension.

Disclosure of Invention

Aiming at the problems of the existing deep learning method, the invention provides a light field image super-resolution network generation method based on multi-view information interaction. When the super-resolution of the light field image is carried out, the network can more fully extract the complementary information between the images of different visual angles of the light field so as to act on super-resolution reconstruction of the light field, and introduces a channel attention mechanism and an countermeasure generation network structure.

In order to solve the technical problems, the invention adopts the following technical scheme:

the input of the network generator and the discriminator are the original 4-dimensional data of the light field, namely the angular resolution U multiplied by V and the spatial resolution W multiplied by H. The network generator firstly uses 3X 3 space convolution and 8 channel attention residual error modules (CARs) based on decomposition kernels, and introduces input image connection and dense residual error connection in the part to realize feature extraction, then uses 2 angle convolutions to realize feature fusion, and finally uses sub-pixel convolutions to finish spatial resolution up-sampling. The super-resolution light field image and the original high-resolution light field image output by the generator are respectively sent to the discriminator, the image features are extracted through 8-layer decomposition kernel convolution, and finally discrimination results are output through an average pooling layer and two 1 multiplied by 1 convolutions. The specific process of each step is as follows:

the specific design of the decomposition-kernel convolution mentioned in the generator is: the data dimension of the input light field image is five-dimensional data (u, v, w, h, c), c representing the number of channels. The light field data with the dimension expressed as (u×v, w, h, c) are considered as sub-aperture images, the u×v is considered as a view image batch and is also called a space subspace, the light field data with the dimension expressed as (w×h, u, v, c) are also considered as angle subspaces, and the light field data with the dimension expressed as (u×h, w, v, c), (w×v, u, h, c), (v×h, u, w, c), (w×u, h, v, c) are considered as EPI subspaces. The operation of decomposing the kernel is to perform 2D convolution operation on the 2 and 3-dimensional data after reducing the dimension of the sequential light field data to each subspace, and the operation can be specifically expressed by the following formula:

DKConv＝k _h，v (k _u，w (k _u，h (k _w，v (k _u，v (k _w，h (L))))))

where L represents the original five-dimensional data of the light field, k _w.h Representing convolving a spatial subspace, the volumeProducts are also known as spatial convolutions; k (k) _u，v The representation convolves the angular subspace, which also becomes the angular convolution, k _w，v 、k _u，h 、k _u，w 、k _h，v Representing the convolution of the EPI subspace, DKConv represents the decomposition kernel, and the specific convolution operation can be expressed by the following formula:

wherein ,representing the 2D convolution operation, W represents the weight of the corresponding 2D convolution, +.>Representing the corresponding subspace of the original light field five-dimensional data L deformation, and f (·) represents the Relu activation function operation; />Representing the deformation of the light field data and in the dimension (d ₁ ，d ₂ ) And performing convolution operation.

Further, the structure of the channel attention residual module (CAR) based on the decomposition kernel is: firstly, a 1 multiplied by 1 decomposition kernel convolution is put in front of the module as a bottleneck layer (BNL), the number of input channels of the module is changed to a preset size, the input channels are sent to a channel attention module after passing through a characteristic extraction layer of the 3 multiplied by 3 decomposition kernel convolution, the channel attention force diagram is obtained through calculation, the obtained channel attention force diagram is multiplied to the input of the channel attention module in a weighting mode, and finally, the final output of the module is obtained through adding and linking with the input of the whole module.

The steps of the channel attention module mentioned above are: the input light field data passes through a global pooling layer, and the pooling layer is used for calculating the attention of the channel, so that the pooling layer can act on all the dimensions except the channel in the data, namely four dimensions of a space dimension and an angle dimension. The dimension number is reduced and only reservedThe channel dimension of the data is reduced to C by one dimension of the channel and one downsampling 1×1 convolution and one upsampling 1×1 convolution _r And then restoring to C, realizing the weight of each channel of the predictive feature map through the step, setting a PReLU activation function between two convolutions, and finally expanding the data dimension to a form matched with input data after the sigmoid activation function, so as to obtain the channel attention map of the input feature map.

The whole characteristic extraction part process is as follows: after the light field data is input into the network generator, the number of channels of the data is expanded to a preset size through a 3 x 3 space convolution, then the data sequentially passes through a feature extraction part formed by connecting 8 decomposition kernel channel attention modules in series, the input of each module is formed by the connection of the generator input and the output of all modules in front of the module in the channel dimension, namely, the structure of input dense residual connection formed by original image connection and dense connection, so that the feature map information of each layer of the network can be fully utilized, and the gradient vanishing phenomenon of the network can be reduced.

Finally, the light field super-resolution image generation step of the network is as follows: the feature images obtained by the feature extraction module are subjected to 3×3 angular convolution firstly to realize that the feature images of a large number of channels linked by dense residual errors are fused and reduced to a preset channel size in the channel dimension so as to fuse information among all view images, and then the channel number is expanded to the square multiple of the up-sampling multiple of the network input image channel through 3×3 spatial convolution, namely a ² c, preparing for sub-pixel convolution operation, and finally, orderly arranging and fusing pixels at the same position of the feature images of all channels through sub-pixel convolution to realize spatial resolution up-sampling of the light field image to a times of the input light field image.

In addition to the generator, the steps of the other part of the arbiter of the network structure are as follows: the super-resolution light field image and the original high-resolution light field image output by the generator are respectively sent to the discriminator, the image characteristics are firstly extracted through 8 layers of decomposition kernel convolution, wherein all decomposition kernels are 3 multiplied by 3, the output channels of the singular decomposition kernel convolution layers are twice as many as the input channels except the first layer of decomposition kernels, the step length is 1, the output channels and the input channels of the double decomposition kernel convolution layers are kept consistent, but the step length is 2, so that the size of the characteristic image is reduced to be half of the original size when the number of channels is doubled each time. All of the split cores, except the first split core, are followed by a BN layer. Finally, an average pooling layer and two 1 multiplied by 1 convolutions are processed by a sigmoid activation function, and then a judging result is output. The closer the value of the discrimination result is to 1, the greater the probability that the representative discriminator considers the input image as a high-resolution image, the closer the discriminated value is to 0, and the greater the probability that the representative discriminator considers the input image as a generated super-resolution image. The discrimination results are fed back to the generator, thereby guiding the improvement of the generator.

The invention has the advantages and positive effects that: compared with the common 2D convolution, the method can better extract the spatial information and the angular information of the light field, introduces a channel attention mechanism, and realizes more comprehensive characteristic information extraction and fusion of the light field data in the process of adding the information of the channel in the dimension into the characteristic extraction. On the other hand, the presence of input-dense residual connections also makes the transfer of the features and gradients of the generator more efficient, making the generator more convenient to train. The discriminator refers to the structure of the discriminator of the SRGAN, so that the balance between the discriminator and the generator is ensured, and a better generation result is obtained. Better performance can be obtained compared with other light field super-resolution algorithms.

Drawings

Fig. 1 is a schematic diagram of the structure of a generator network of the present invention.

Fig. 2 is a schematic diagram of the structure of the bottleneck layer in the generator.

Fig. 3 is a schematic structural diagram of a channel attention residual module based on decomposition kernel.

FIG. 4 is a schematic diagram of the operation of the deconvolution kernel.

Fig. 5 is a schematic structural diagram of a discriminator of the network of the invention.

Detailed Description

In order to make the features and technical solutions of the present invention more clear and understandable, the following detailed description of the specific embodiments of the present invention will be described in detail.

As shown in fig. 1, the generator (Generator Network) of the network has an input of a low-resolution light field image (LR) obtained by downsampling a high-resolution light field image (HR), an angular resolution u×v, and a spatial resolution w×h. The network comprises a feature extraction part and an image generation part, wherein the feature extraction part firstly carries out shallow feature extraction through one-Layer convolution, then adopts 8 channel attention residual error modules (Channel Attention-based Residual Layer, CAR) based on decomposition kernels to realize deep feature extraction, and in order to reduce the problem of network gradient disappearance, the feature extraction part also introduces dense residual error connection, so that a Bottleneck Layer (BNL) is added in front of the CAR module to ensure that the input feature channel number of the CAR module is consistent with the preset value, and the decomposition kernel convolution (Decomposition Kernel Convolution, DKConv) is used in the BNL and the CAR module to adapt the 4-dimensional data of the light field, and finally the generator outputs the generated super-resolution image (SR). The various parts of the overall network are described in detail below.

CAR module-based feature extraction

After the light field data is input into the network generator, the number of channels of the data is increased to a preset size through a 3 multiplied by 3 convolution (conv), then the data sequentially passes through a feature extraction part formed by connecting 8 CAR modules in series, the input of each module is formed by the input of the generator and the output of all modules in front of the module, and the connection in the channel dimension is formed.

Bottleneck layer (BNL)

Because the feature extraction part adopts a dense residual structure, a Bottleneck Layer (BNL) is added in front of the CAR module, and the Layer is formed by 1×1 decomposition kernel convolution as shown in fig. 2, so as to ensure that the channel number of input data of each module is kept consistent.

Channel attention residual module (CAR) based on decomposition kernel

The decomposition kernel convolution fully extracts the information of space dimension and angle dimension in the original light field data, but ignores the information of channel dimension in the light field data, and the information between feature images of different dimensions has positive effect on super-resolution reconstruction of the light field, so that the invention connects a channel attention module behind the decomposition kernel convolution to acquire the weights of different channels, thereby realizing capturing the information between different feature images, and adding a residual structure for the module to ensure that the gradient of the network cannot disappear, thereby reducing the training difficulty of the network and improving the performance of the network.

Specific flow of single CAR module: as shown in fig. 3, feature extraction is performed by a 3 x 3 decomposition kernel convolution before entering the channel attention module. The module passes the input feature map through a global pooling layer (globalpool) which, because of the computational channel attention, acts in all dimensions of the feature map except the channel, i.e., the four dimensions of the spatial dimension and the angular dimension. Then the deformation (reshape) reduces the dimension number, only one dimension of the channel is reserved, and the channel dimension of the data is reduced to C through convolution (conv) with the size of 1 multiplied by 1 by downsampling and upsampling _r And then restoring to C, realizing the weight of each channel of the predictive feature map through the step, setting a PReLU activation function between two convolutions, and finally expanding the data dimension to a form matched with input data through deformation (reshape) after the sigmoid activation function is passed, so as to obtain the channel attention map of the input feature map. And multiplying the obtained channel attention weight to the input of the channel attention module, and finally, carrying out addition linking with the input of the whole module to obtain the final output of the module.

Decomposition kernel convolution (DKConv)

When the light field is reconstructed, the complementary information between the images of different visual angles of the light field has a very positive effect on the reconstructed result, so that in order to extract the complementary information of the multi-visual angle images, the invention adopts decomposition kernel convolution to adapt to the high-dimensional data of the light field so as to realize the information extraction of the light field.

Before introducing the decomposition kernel, firstly, analyzing the dimension reduction principle of light field data, wherein the original light field data has five dimensions, namely an angular dimension (u, v), a spatial dimension (w, h), when the channel c,4D light field image is reduced to a two-dimensional image, the number obtained by multiplying any two of the four dimensions of the angle and the space is used as the array number of the two-dimensional image, the remaining three dimensions are expressed as a two-dimensional image, namely the dimension reduction operation is completed, the light field data with the dimensions expressed as (u×v, w, h, c) are regarded as a sub-aperture image array, u×v is regarded as a view image batch, and also referred to as a spatial subspace, and the light field data with the dimensions expressed as (w×h, u, v, c) are angular subspaces, and the light field data with the dimensions expressed as (u×h, w, v, c), (w×v, u, h, c) are expressed as an EPI subspace, and six subspaces.

The specific flow of the single decomposition kernel convolution is as follows: as shown in fig. 4, the data dimension of the input light field image is five-dimensional data (u, v, w, h, c), which is first subjected to morphing operations (reshape) to (u×v, w, h, c), and to 2D convolution (conv) operations of 3×3 two-dimensional degrees, followed by activation of the function by Relu. The next five steps are to sequentially deform the light field data to the remaining five data formats, then to perform 3×3 convolution and Relu activation function operation on the light field data, and finally to restore to the original five-dimensional light field data format. Specifically, the method can be represented by the following formula:

DKConv＝k _h，v (k _u，w (k _u，h (k _w，v (k _u， v(k _w，h (L))))))

where L represents the original five-dimensional data of the light field, k _w.h Representing convolving the spatial subspace, also referred to as spatial convolution; k (k) _u，v The representation convolves the angular subspace, which also becomes the angular convolution, k _w，v 、k _u，h 、k _u，w 、k _h，v Representing the convolution of the EPI subspace, DKConv represents the decomposition kernel, and the specific convolution operation can be expressed by the following formula:

Image generating section

The feature image obtained by the feature extraction module firstly carries out a 3×3 convolution (conv) to carry out feature fusion reduction (Feature Reduction) on the multi-channel feature image obtained by dense residual error linkage, reduces the number of the feature channels to a preset size to fuse information among all view images, and then extends the number of the channels to the square multiple of the up-sampling multiple of the network input image channel through a 3×3 convolution (conv), namely a ² c, preparing for sub-Pixel convolution operation, and finally, sequentially arranging and fusing pixels at the same position of the feature images of all channels through a sub-Pixel convolution layer (Pixel buffer) to realize spatial resolution up-sampling of the light field image to a times of the input light field image.

Identifier (Discriminator Network)

The role of the arbiter in the antagonism generation network is very important, the outcome of the arbiter is the direction of generator optimization, so the arbiter needs to maintain a balance with the generator, and any party that is too strong will cause the generator to lose the objective of optimization. As shown in fig. 5, the discriminator takes the SR image and the HR image generated by the generator as input, the network main body also uses decomposition kernel convolution to extract information of the light field image, and the output result of the discriminator is fed back to the generator to guide the optimization of the generator. The specific flow of the network is as follows:

the super-resolution light field image and the original high-resolution light field image output by the generator are respectively sent to the discriminator, the image characteristics are firstly extracted through 8-layer decomposition kernel convolution, and a batch normalization layer (Batch Normalization, BN) is connected after all decomposition kernels except the first layer of decomposition kernel. The number of characteristic channels of the partial decomposition kernel convolution is gradually doubled from 24 to 192, and the size of the characteristic diagram is reduced to half of the original size by setting the step length of the decomposition kernel convolution to 2 when the number of the characteristic channels is doubled each time. The data output by the last layer of decomposition kernel convolution is processed by a sigmoid activation function through an average pooling layer (AvgPool) and two 1 multiplied by 1 convolutions (conv) to output a judging result. The closer the value of the discrimination result is to 1, the greater the probability that the representative discriminator considers the input image as a high-resolution image, the closer the discriminated value is to 0, and the greater the probability that the representative discriminator considers the input image as a generated super-resolution image. The discrimination results are fed back to the generator, thereby guiding the improvement of the generator.

The present invention was trained and tested on the stflitro public dataset. Firstly, selecting a 5 multiplied by 5 view angle image in the center of an original light field image, then converting an RGB image of the original light field into a YCbCr image, generating a low resolution image through bicubic interpolation downsampling, independently taking an image of a Y channel, sending the image into a network for learning to generate a high resolution image, directly carrying out upsampling operation on the rest CbCr channels by a bicubic interpolation method, and finally merging the obtained Y channel image and the CbCr image to obtain a final high resolution light field image. During training, a block with the size of 48×48 is cut and subjected to random inversion rotation operation to be used as a training data set. The network is constructed by adopting a Pytorch architecture, the perceptual loss function, the anti-loss function and the MSE loss function are used for training, the Adam method is adopted for optimization, the batch size is set to be 2, and the initial learning rates of the generator and the discriminator are 10 ^-4 The learning rate is reduced by one every 100 periodsHalf, the training ends after 700 cycles.

The invention provides a light field image super-resolution network based on multi-view information interaction, which solves the problem of insufficient spatial resolution of light field images. The network first uses an antagonism generation network to improve the learning ability of the network. Then, aiming at the specificity of the 4-dimensional light field, the spatial angle information of the light field is extracted by adopting decomposition kernel convolution, and a channel attention mechanism is introduced to further extract the information in the light field data. The validity of the invention is verified on the public dataset. In addition, aiming at the problem of lower SSIM value of the network result, the loss function structure is further optimized in the later period, and a better result is obtained on the premise of ensuring the network performance.

While the embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. A light field super-resolution network generation method based on multi-view information interaction is characterized by comprising the following steps of: the input of the network generator and the arbiter is the original 4-dimensional data of the light field, namely the angular resolution U multiplied by V and the spatial resolution W multiplied by H; the network generator firstly realizes feature extraction by 3X 3 space convolution and 8 channel attention residual modules based on decomposition cores, and introducing input image connection and dense residual connection; feature fusion is realized by using 2 angle convolutions, and finally spatial resolution upsampling is completed by using sub-pixel convolutions;

the super-resolution light field image and the original high-resolution light field image output by the generator are respectively sent to the discriminator, the image features are extracted through 8-layer decomposition kernel convolution, and finally discrimination results are output through an average pooling layer and two 1 multiplied by 1 convolutions.

2. The multi-view information interaction-based light field super-resolution network generation method according to claim 1, wherein: the specific design of the decomposition kernel convolution is: the data dimension of the input light field image is five-dimensional data (u, v, w, h, c), and c represents the channel number; obtaining each low-dimensional subspace of the light field image after dimension reduction, wherein the light field data with dimension expressed as (u multiplied by v, w, h, c) is considered as a sub-aperture image, and u multiplied by v is considered as a view angle image batch, which is also called a space subspace; similarly, the light field data with the dimension represented by (w×h, u, v, c) is an angular subspace, the light field data with the dimension represented by (u×h, w, v, c), (w×v, u, h, c), (v×h, u, w, c), (w×u, h, v, c) is an EPI subspace, six subspaces are included, and the operation of decomposing the kernel is to perform 2D convolution operation on the 2 and 3-dimensional data after the sequential light field data is reduced to each subspace, which can be specifically expressed by the following formula:

DKConv＝k _h，v (k _u，w (k _u，h (k _w，v (k _u，v (k _w，h (L))))))

where L represents the original five-dimensional data of the light field, k _w，h Representing convolving the spatial subspace, also referred to as spatial convolution; k (k) _u，v The representation convolves the angular subspace, which also becomes the angular convolution, k _w，v 、k _u，h 、k _u，w 、k _h，v Representing convolving the EPI subspace, DKConv representing the decomposition kernel;

the specific convolution operation can be expressed by the following formula:

3. The multi-view information interaction-based light field super-resolution network generation method according to claim 1, wherein: the structure of the channel attention residual error module based on the decomposition core is as follows: firstly, a 1 multiplied by 1 decomposition kernel convolution is put in front of the channel attention residual module as a bottleneck layer, the input channel number of the channel attention residual module is changed to a preset size, the channel attention residual module is sent into the channel attention module after passing through a characteristic extraction layer of the 3 multiplied by 3 decomposition kernel convolution, the channel attention force diagram is obtained through calculation, the obtained channel attention weight is multiplied to the input of the channel attention module, and finally, the channel attention force diagram and the input of the whole channel attention residual module are subjected to addition link to obtain the final output of the channel attention residual module.

4. A multi-view information interaction-based light field super-resolution network generation method according to claim 3, characterized in that: the working process of the channel attention module is as follows: the input light field data passes through a global pooling layer, and the pooling layer is used for calculating the attention of a channel, so that the pooling layer can act on all the dimensions except the channel in the data, namely four dimensions of a space dimension and an angle dimension; then the dimension number is reduced, only one dimension of the channel is reserved, and the channel dimension of the data is reduced to C through a downsampling 1×1 convolution and an upsampling 1×1 convolution _r And then restoring to C, realizing the weight of each channel of the predictive feature map through the step, setting a PReLU activation function between two convolutions, and finally expanding the data dimension to a form matched with input data after the sigmoid activation function, so as to obtain the channel attention map of the input feature map.

5. The multi-view information interaction-based light field super-resolution network generation method according to claim 1, wherein: the working process of the feature extraction is as follows: after the light field data is input into the network generator, the number of channels of the data is expanded to a preset size through 3 x 3 space convolution, and then the data sequentially passes through a feature extraction part which is formed by connecting 8 bottleneck layers and channel attention residual modules in series, wherein the input of each channel attention residual module is formed by the connection of the generator input and the output before the channel attention residual module in the channel dimension, namely, the structure of input dense residual connection formed by original image connection and dense connection.

6. The multi-view information interaction-based light field super-resolution network generation method according to claim 1, wherein: the super-resolution light field image generation step is as follows: firstly, carrying out 3X 3 angle convolution on the obtained feature images, fusing and reducing a large number of feature images of channels with dense residual links to a preset channel size in the channel dimension so as to fuse information among all view images, and then, expanding the channel number to the square multiple of the up-sampling multiple of the network input image channel through 3X 3 space convolution, namely a ² c, preparing for sub-pixel convolution operation, and finally, orderly arranging and fusing pixels at the same position of the feature images of all channels through sub-pixel convolution to realize spatial resolution up-sampling of the light field image to a times of the input light field image.

7. The multi-view information interaction-based light field super-resolution network generation method according to claim 1, wherein: the working process of the discriminator is as follows: the super-resolution light field image and the original high-resolution light field image output by the generator are respectively sent to the discriminator, firstly, image features are extracted through 8 layers of decomposition kernel convolution, wherein all decomposition kernels are 3×3 in size, the output channels of the singular decomposition kernel convolution layers are twice as large as the input channels except the first layer of decomposition kernels, the step length is 1, the output channels and the input channels of the double decomposition kernel convolution layers are kept consistent, but the step length is 2, so that the size of the feature image is reduced to half of the original size when the number of channels is doubled each time, a BN layer is connected after all decomposition kernels except the first layer of decomposition kernels, and finally, discrimination results are output through an average pooling layer and two 1×1 convolutions; the closer the value of the discrimination result is to 1, the larger the probability that the representative discriminator considers that the input image is a high-resolution image, the closer the discrimination value is to 0, the larger the probability that the representative discriminator considers that the input image is a generated super-resolution image, and the discrimination result is fed back to the generator, thereby guiding the improvement of the generator.