CN116957057A - Multi-view information interaction-based light field image super-resolution network generation method - Google Patents
Multi-view information interaction-based light field image super-resolution network generation method Download PDFInfo
- Publication number
- CN116957057A CN116957057A CN202310410553.3A CN202310410553A CN116957057A CN 116957057 A CN116957057 A CN 116957057A CN 202310410553 A CN202310410553 A CN 202310410553A CN 116957057 A CN116957057 A CN 116957057A
- Authority
- CN
- China
- Prior art keywords
- light field
- convolution
- image
- resolution
- channel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000003993 interaction Effects 0.000 title claims abstract description 13
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 54
- 238000000605 extraction Methods 0.000 claims abstract description 23
- 230000006870 function Effects 0.000 claims description 18
- 230000004913 activation Effects 0.000 claims description 13
- 238000011176 pooling Methods 0.000 claims description 12
- 238000010586 diagram Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 7
- 230000009467 reduction Effects 0.000 claims description 6
- 230000006872 improvement Effects 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000004927 fusion Effects 0.000 claims description 4
- 238000013461 design Methods 0.000 claims description 2
- 239000000284 extract Substances 0.000 abstract description 6
- 230000007246 mechanism Effects 0.000 abstract description 4
- 238000013527 convolutional neural network Methods 0.000 description 7
- 238000012549 training Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 4
- 230000000295 complement effect Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000008092 positive effect Effects 0.000 description 3
- 230000008485 antagonism Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a light field super-resolution network generation method based on multi-view information interaction, belongs to the technical field of image super-resolution reconstruction, and aims to solve the problem of insufficient spatial resolution of light field images. The invention adopts the countermeasure generation network as the whole structure of the network to improve the performance of the network. The decomposition kernel convolution is adopted for the specificity of the light field data to extract the space angle information of the light field image, and meanwhile, the decomposition kernel convolution is combined with the channel attention mechanism to form a channel attention residual error module based on the decomposition kernel. A generator of the proposed network adopts a plurality of decomposition kernel channel attention residual modules connected in series, and an input intensive residual structure is introduced into the part to realize feature extraction. And the discriminant also uses decomposition kernel convolution to extract features and output a final discriminant result. The network can extract richer information in the light field data to realize light field super-resolution reconstruction.
Description
Technical Field
The invention belongs to the technical field of computational imaging, and particularly relates to a light field image super-resolution network generation method based on multi-view information interaction.
Background
Along with the development of the technology in the era, the field of camera imaging is rapidly developed, and when a traditional camera shoots an object, only 2D projection of light rays is recorded, so that a large amount of spatial angle information of a light field is lost, and in order to acquire the missing information, multiple times of shooting of scenes are needed at different positions, so that a multi-view image is obtained. Compared with the traditional camera, the light field camera can obtain multi-view images of a scene by shooting once, can obtain space and angle information of a light field at the same time, can obtain depth information, refocusing view images and the like of the scene after processing, and has higher imaging efficiency than the traditional camera, but the cost of obtaining the angle information of the light field camera is that the space resolution of a single view image is sacrificed, and the lower of the space resolution has no small influence on various data processing, so that the super-resolution technology for the light field image has important significance.
In recent years, due to the development of convolutional neural networks (Convolutional Neural Network, CNN) and the advent of light field data sets, deep learning-based approaches have a good performance in terms of light field reconstruction. In 2016, yoon et al used CNN for light field super-resolution reconstruction for the first time, and the method used two cascaded convolutional neural networks to perform spatial and angular super-resolution reconstruction on light field images respectively. In 2017, farugia et al proposed a dictionary learning-based method to learn the mapping between light field low-resolution and high-resolution images. In the same year, gaochang Wu et al, university of Qinghai, studied a light field super-resolution reconstruction technique based on polar planar images (epipolar plane image, EPI). In 2018, wang et al designed a bi-directional recursive CNN for the structure of the neural network, then super-resolved the horizontal and vertical image stacks with CNN, and finally unified the stacks by superposition generalization to get the complete view image. Zhang et al then used a multi-branched residual network to achieve spatial super-resolution reconstruction of the light field, the inputs of the different branches were sub-aperture images stacked in different directions, respectively, to learn the correlation between the sub-aperture images in different directions, and finally fused the extracted features to reconstruct the light field. In 2020 Chen et al applied the structure of the countermeasure generation network (Generative Adversarial Network, GAN) in light field super-resolution reconstruction, and also proposed EPI loss functions to reduce the gap between the reconstruction result and the real light field.
On the other hand, an improvement method for convolution has been proposed in recent years. In 2019, meng et al proposed a convolution method for four-dimensional data of a light field, namely 4D convolution, and simultaneously extracted spatial angle information of the light field through high-dimensional convolution. In the same year, yeung et al propose separable Spatial-angle convolution (SAS) aiming at the problem of low calculation efficiency of 4D convolution, and the calculation speed of a network is greatly improved while the performance and the 4D convolution are ensured to be equal. In 2022, hu et al further performs dimension reduction analysis on the optical field data based on separable convolution, and put forward the concept of decomposition kernel, and the convolution method can extract information from the angular subspace and the spatial subspace of the optical field as well as from the four EPI subspaces of the optical field as in SAS convolution. In addition to the separable convolution method, wang et al in 2020 have proposed a spatial feature extractor and (spatial featureextractor, SFE) and an angular feature extractor (angular feature extractor, AFE) for extracting spatial information and angular information, respectively, of a light field image by changing the shape of the 2D convolution. In 2022, the authors have proposed horizontal and vertical EPI feature extractors based on SFE and AFE, further improving network performance. It is worth mentioning that the improvement of the light field GAN in the convolution direction is only the LightGAN proposed by Meng et al in 2020 at present, and the method replaces the 2D convolution with the 4D convolution in the GAN network to extract the high-dimensional information of the light field data, so as to realize the light field reconstruction.
In recent years, deep learning has greatly advanced in the field of super-resolution of light field images, but information utilization of light field data is insufficient, and in the process of super-resolution, many methods do not pay attention to the information of the light field data in the channel dimension.
Disclosure of Invention
Aiming at the problems of the existing deep learning method, the invention provides a light field image super-resolution network generation method based on multi-view information interaction. When the super-resolution of the light field image is carried out, the network can more fully extract the complementary information between the images of different visual angles of the light field so as to act on super-resolution reconstruction of the light field, and introduces a channel attention mechanism and an countermeasure generation network structure.
In order to solve the technical problems, the invention adopts the following technical scheme:
the input of the network generator and the discriminator are the original 4-dimensional data of the light field, namely the angular resolution U multiplied by V and the spatial resolution W multiplied by H. The network generator firstly uses 3X 3 space convolution and 8 channel attention residual error modules (CARs) based on decomposition kernels, and introduces input image connection and dense residual error connection in the part to realize feature extraction, then uses 2 angle convolutions to realize feature fusion, and finally uses sub-pixel convolutions to finish spatial resolution up-sampling. The super-resolution light field image and the original high-resolution light field image output by the generator are respectively sent to the discriminator, the image features are extracted through 8-layer decomposition kernel convolution, and finally discrimination results are output through an average pooling layer and two 1 multiplied by 1 convolutions. The specific process of each step is as follows:
the specific design of the decomposition-kernel convolution mentioned in the generator is: the data dimension of the input light field image is five-dimensional data (u, v, w, h, c), c representing the number of channels. The light field data with the dimension expressed as (u×v, w, h, c) are considered as sub-aperture images, the u×v is considered as a view image batch and is also called a space subspace, the light field data with the dimension expressed as (w×h, u, v, c) are also considered as angle subspaces, and the light field data with the dimension expressed as (u×h, w, v, c), (w×v, u, h, c), (v×h, u, w, c), (w×u, h, v, c) are considered as EPI subspaces. The operation of decomposing the kernel is to perform 2D convolution operation on the 2 and 3-dimensional data after reducing the dimension of the sequential light field data to each subspace, and the operation can be specifically expressed by the following formula:
DKConv=k h,v (k u,w (k u,h (k w,v (k u,v (k w,h (L))))))
where L represents the original five-dimensional data of the light field, k w.h Representing convolving a spatial subspace, the volumeProducts are also known as spatial convolutions; k (k) u,v The representation convolves the angular subspace, which also becomes the angular convolution, k w,v 、k u,h 、k u,w 、k h,v Representing the convolution of the EPI subspace, DKConv represents the decomposition kernel, and the specific convolution operation can be expressed by the following formula:
wherein ,representing the 2D convolution operation, W represents the weight of the corresponding 2D convolution, +.>Representing the corresponding subspace of the original light field five-dimensional data L deformation, and f (·) represents the Relu activation function operation; />Representing the deformation of the light field data and in the dimension (d 1 ,d 2 ) And performing convolution operation.
Further, the structure of the channel attention residual module (CAR) based on the decomposition kernel is: firstly, a 1 multiplied by 1 decomposition kernel convolution is put in front of the module as a bottleneck layer (BNL), the number of input channels of the module is changed to a preset size, the input channels are sent to a channel attention module after passing through a characteristic extraction layer of the 3 multiplied by 3 decomposition kernel convolution, the channel attention force diagram is obtained through calculation, the obtained channel attention force diagram is multiplied to the input of the channel attention module in a weighting mode, and finally, the final output of the module is obtained through adding and linking with the input of the whole module.
The steps of the channel attention module mentioned above are: the input light field data passes through a global pooling layer, and the pooling layer is used for calculating the attention of the channel, so that the pooling layer can act on all the dimensions except the channel in the data, namely four dimensions of a space dimension and an angle dimension. The dimension number is reduced and only reservedThe channel dimension of the data is reduced to C by one dimension of the channel and one downsampling 1×1 convolution and one upsampling 1×1 convolution r And then restoring to C, realizing the weight of each channel of the predictive feature map through the step, setting a PReLU activation function between two convolutions, and finally expanding the data dimension to a form matched with input data after the sigmoid activation function, so as to obtain the channel attention map of the input feature map.
The whole characteristic extraction part process is as follows: after the light field data is input into the network generator, the number of channels of the data is expanded to a preset size through a 3 x 3 space convolution, then the data sequentially passes through a feature extraction part formed by connecting 8 decomposition kernel channel attention modules in series, the input of each module is formed by the connection of the generator input and the output of all modules in front of the module in the channel dimension, namely, the structure of input dense residual connection formed by original image connection and dense connection, so that the feature map information of each layer of the network can be fully utilized, and the gradient vanishing phenomenon of the network can be reduced.
Finally, the light field super-resolution image generation step of the network is as follows: the feature images obtained by the feature extraction module are subjected to 3×3 angular convolution firstly to realize that the feature images of a large number of channels linked by dense residual errors are fused and reduced to a preset channel size in the channel dimension so as to fuse information among all view images, and then the channel number is expanded to the square multiple of the up-sampling multiple of the network input image channel through 3×3 spatial convolution, namely a 2 c, preparing for sub-pixel convolution operation, and finally, orderly arranging and fusing pixels at the same position of the feature images of all channels through sub-pixel convolution to realize spatial resolution up-sampling of the light field image to a times of the input light field image.
In addition to the generator, the steps of the other part of the arbiter of the network structure are as follows: the super-resolution light field image and the original high-resolution light field image output by the generator are respectively sent to the discriminator, the image characteristics are firstly extracted through 8 layers of decomposition kernel convolution, wherein all decomposition kernels are 3 multiplied by 3, the output channels of the singular decomposition kernel convolution layers are twice as many as the input channels except the first layer of decomposition kernels, the step length is 1, the output channels and the input channels of the double decomposition kernel convolution layers are kept consistent, but the step length is 2, so that the size of the characteristic image is reduced to be half of the original size when the number of channels is doubled each time. All of the split cores, except the first split core, are followed by a BN layer. Finally, an average pooling layer and two 1 multiplied by 1 convolutions are processed by a sigmoid activation function, and then a judging result is output. The closer the value of the discrimination result is to 1, the greater the probability that the representative discriminator considers the input image as a high-resolution image, the closer the discriminated value is to 0, and the greater the probability that the representative discriminator considers the input image as a generated super-resolution image. The discrimination results are fed back to the generator, thereby guiding the improvement of the generator.
The invention has the advantages and positive effects that: compared with the common 2D convolution, the method can better extract the spatial information and the angular information of the light field, introduces a channel attention mechanism, and realizes more comprehensive characteristic information extraction and fusion of the light field data in the process of adding the information of the channel in the dimension into the characteristic extraction. On the other hand, the presence of input-dense residual connections also makes the transfer of the features and gradients of the generator more efficient, making the generator more convenient to train. The discriminator refers to the structure of the discriminator of the SRGAN, so that the balance between the discriminator and the generator is ensured, and a better generation result is obtained. Better performance can be obtained compared with other light field super-resolution algorithms.
Drawings
Fig. 1 is a schematic diagram of the structure of a generator network of the present invention.
Fig. 2 is a schematic diagram of the structure of the bottleneck layer in the generator.
Fig. 3 is a schematic structural diagram of a channel attention residual module based on decomposition kernel.
FIG. 4 is a schematic diagram of the operation of the deconvolution kernel.
Fig. 5 is a schematic structural diagram of a discriminator of the network of the invention.
Detailed Description
In order to make the features and technical solutions of the present invention more clear and understandable, the following detailed description of the specific embodiments of the present invention will be described in detail.
As shown in fig. 1, the generator (Generator Network) of the network has an input of a low-resolution light field image (LR) obtained by downsampling a high-resolution light field image (HR), an angular resolution u×v, and a spatial resolution w×h. The network comprises a feature extraction part and an image generation part, wherein the feature extraction part firstly carries out shallow feature extraction through one-Layer convolution, then adopts 8 channel attention residual error modules (Channel Attention-based Residual Layer, CAR) based on decomposition kernels to realize deep feature extraction, and in order to reduce the problem of network gradient disappearance, the feature extraction part also introduces dense residual error connection, so that a Bottleneck Layer (BNL) is added in front of the CAR module to ensure that the input feature channel number of the CAR module is consistent with the preset value, and the decomposition kernel convolution (Decomposition Kernel Convolution, DKConv) is used in the BNL and the CAR module to adapt the 4-dimensional data of the light field, and finally the generator outputs the generated super-resolution image (SR). The various parts of the overall network are described in detail below.
CAR module-based feature extraction
After the light field data is input into the network generator, the number of channels of the data is increased to a preset size through a 3 multiplied by 3 convolution (conv), then the data sequentially passes through a feature extraction part formed by connecting 8 CAR modules in series, the input of each module is formed by the input of the generator and the output of all modules in front of the module, and the connection in the channel dimension is formed.
Bottleneck layer (BNL)
Because the feature extraction part adopts a dense residual structure, a Bottleneck Layer (BNL) is added in front of the CAR module, and the Layer is formed by 1×1 decomposition kernel convolution as shown in fig. 2, so as to ensure that the channel number of input data of each module is kept consistent.
Channel attention residual module (CAR) based on decomposition kernel
The decomposition kernel convolution fully extracts the information of space dimension and angle dimension in the original light field data, but ignores the information of channel dimension in the light field data, and the information between feature images of different dimensions has positive effect on super-resolution reconstruction of the light field, so that the invention connects a channel attention module behind the decomposition kernel convolution to acquire the weights of different channels, thereby realizing capturing the information between different feature images, and adding a residual structure for the module to ensure that the gradient of the network cannot disappear, thereby reducing the training difficulty of the network and improving the performance of the network.
Specific flow of single CAR module: as shown in fig. 3, feature extraction is performed by a 3 x 3 decomposition kernel convolution before entering the channel attention module. The module passes the input feature map through a global pooling layer (globalpool) which, because of the computational channel attention, acts in all dimensions of the feature map except the channel, i.e., the four dimensions of the spatial dimension and the angular dimension. Then the deformation (reshape) reduces the dimension number, only one dimension of the channel is reserved, and the channel dimension of the data is reduced to C through convolution (conv) with the size of 1 multiplied by 1 by downsampling and upsampling r And then restoring to C, realizing the weight of each channel of the predictive feature map through the step, setting a PReLU activation function between two convolutions, and finally expanding the data dimension to a form matched with input data through deformation (reshape) after the sigmoid activation function is passed, so as to obtain the channel attention map of the input feature map. And multiplying the obtained channel attention weight to the input of the channel attention module, and finally, carrying out addition linking with the input of the whole module to obtain the final output of the module.
Decomposition kernel convolution (DKConv)
When the light field is reconstructed, the complementary information between the images of different visual angles of the light field has a very positive effect on the reconstructed result, so that in order to extract the complementary information of the multi-visual angle images, the invention adopts decomposition kernel convolution to adapt to the high-dimensional data of the light field so as to realize the information extraction of the light field.
Before introducing the decomposition kernel, firstly, analyzing the dimension reduction principle of light field data, wherein the original light field data has five dimensions, namely an angular dimension (u, v), a spatial dimension (w, h), when the channel c,4D light field image is reduced to a two-dimensional image, the number obtained by multiplying any two of the four dimensions of the angle and the space is used as the array number of the two-dimensional image, the remaining three dimensions are expressed as a two-dimensional image, namely the dimension reduction operation is completed, the light field data with the dimensions expressed as (u×v, w, h, c) are regarded as a sub-aperture image array, u×v is regarded as a view image batch, and also referred to as a spatial subspace, and the light field data with the dimensions expressed as (w×h, u, v, c) are angular subspaces, and the light field data with the dimensions expressed as (u×h, w, v, c), (w×v, u, h, c) are expressed as an EPI subspace, and six subspaces.
The specific flow of the single decomposition kernel convolution is as follows: as shown in fig. 4, the data dimension of the input light field image is five-dimensional data (u, v, w, h, c), which is first subjected to morphing operations (reshape) to (u×v, w, h, c), and to 2D convolution (conv) operations of 3×3 two-dimensional degrees, followed by activation of the function by Relu. The next five steps are to sequentially deform the light field data to the remaining five data formats, then to perform 3×3 convolution and Relu activation function operation on the light field data, and finally to restore to the original five-dimensional light field data format. Specifically, the method can be represented by the following formula:
DKConv=k h,v (k u,w (k u,h (k w,v (k u, v(k w,h (L))))))
where L represents the original five-dimensional data of the light field, k w.h Representing convolving the spatial subspace, also referred to as spatial convolution; k (k) u,v The representation convolves the angular subspace, which also becomes the angular convolution, k w,v 、k u,h 、k u,w 、k h,v Representing the convolution of the EPI subspace, DKConv represents the decomposition kernel, and the specific convolution operation can be expressed by the following formula:
wherein ,representing the 2D convolution operation, W represents the weight of the corresponding 2D convolution, +.>Representing the corresponding subspace of the original light field five-dimensional data L deformation, and f (·) represents the Relu activation function operation; />Representing the deformation of the light field data and in the dimension (d 1 ,d 2 ) And performing convolution operation.
Image generating section
The feature image obtained by the feature extraction module firstly carries out a 3×3 convolution (conv) to carry out feature fusion reduction (Feature Reduction) on the multi-channel feature image obtained by dense residual error linkage, reduces the number of the feature channels to a preset size to fuse information among all view images, and then extends the number of the channels to the square multiple of the up-sampling multiple of the network input image channel through a 3×3 convolution (conv), namely a 2 c, preparing for sub-Pixel convolution operation, and finally, sequentially arranging and fusing pixels at the same position of the feature images of all channels through a sub-Pixel convolution layer (Pixel buffer) to realize spatial resolution up-sampling of the light field image to a times of the input light field image.
Identifier (Discriminator Network)
The role of the arbiter in the antagonism generation network is very important, the outcome of the arbiter is the direction of generator optimization, so the arbiter needs to maintain a balance with the generator, and any party that is too strong will cause the generator to lose the objective of optimization. As shown in fig. 5, the discriminator takes the SR image and the HR image generated by the generator as input, the network main body also uses decomposition kernel convolution to extract information of the light field image, and the output result of the discriminator is fed back to the generator to guide the optimization of the generator. The specific flow of the network is as follows:
the super-resolution light field image and the original high-resolution light field image output by the generator are respectively sent to the discriminator, the image characteristics are firstly extracted through 8-layer decomposition kernel convolution, and a batch normalization layer (Batch Normalization, BN) is connected after all decomposition kernels except the first layer of decomposition kernel. The number of characteristic channels of the partial decomposition kernel convolution is gradually doubled from 24 to 192, and the size of the characteristic diagram is reduced to half of the original size by setting the step length of the decomposition kernel convolution to 2 when the number of the characteristic channels is doubled each time. The data output by the last layer of decomposition kernel convolution is processed by a sigmoid activation function through an average pooling layer (AvgPool) and two 1 multiplied by 1 convolutions (conv) to output a judging result. The closer the value of the discrimination result is to 1, the greater the probability that the representative discriminator considers the input image as a high-resolution image, the closer the discriminated value is to 0, and the greater the probability that the representative discriminator considers the input image as a generated super-resolution image. The discrimination results are fed back to the generator, thereby guiding the improvement of the generator.
The present invention was trained and tested on the stflitro public dataset. Firstly, selecting a 5 multiplied by 5 view angle image in the center of an original light field image, then converting an RGB image of the original light field into a YCbCr image, generating a low resolution image through bicubic interpolation downsampling, independently taking an image of a Y channel, sending the image into a network for learning to generate a high resolution image, directly carrying out upsampling operation on the rest CbCr channels by a bicubic interpolation method, and finally merging the obtained Y channel image and the CbCr image to obtain a final high resolution light field image. During training, a block with the size of 48×48 is cut and subjected to random inversion rotation operation to be used as a training data set. The network is constructed by adopting a Pytorch architecture, the perceptual loss function, the anti-loss function and the MSE loss function are used for training, the Adam method is adopted for optimization, the batch size is set to be 2, and the initial learning rates of the generator and the discriminator are 10 -4 The learning rate is reduced by one every 100 periodsHalf, the training ends after 700 cycles.
The invention provides a light field image super-resolution network based on multi-view information interaction, which solves the problem of insufficient spatial resolution of light field images. The network first uses an antagonism generation network to improve the learning ability of the network. Then, aiming at the specificity of the 4-dimensional light field, the spatial angle information of the light field is extracted by adopting decomposition kernel convolution, and a channel attention mechanism is introduced to further extract the information in the light field data. The validity of the invention is verified on the public dataset. In addition, aiming at the problem of lower SSIM value of the network result, the loss function structure is further optimized in the later period, and a better result is obtained on the premise of ensuring the network performance.
While the embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.
Claims (7)
1. A light field super-resolution network generation method based on multi-view information interaction is characterized by comprising the following steps of: the input of the network generator and the arbiter is the original 4-dimensional data of the light field, namely the angular resolution U multiplied by V and the spatial resolution W multiplied by H; the network generator firstly realizes feature extraction by 3X 3 space convolution and 8 channel attention residual modules based on decomposition cores, and introducing input image connection and dense residual connection; feature fusion is realized by using 2 angle convolutions, and finally spatial resolution upsampling is completed by using sub-pixel convolutions;
the super-resolution light field image and the original high-resolution light field image output by the generator are respectively sent to the discriminator, the image features are extracted through 8-layer decomposition kernel convolution, and finally discrimination results are output through an average pooling layer and two 1 multiplied by 1 convolutions.
2. The multi-view information interaction-based light field super-resolution network generation method according to claim 1, wherein: the specific design of the decomposition kernel convolution is: the data dimension of the input light field image is five-dimensional data (u, v, w, h, c), and c represents the channel number; obtaining each low-dimensional subspace of the light field image after dimension reduction, wherein the light field data with dimension expressed as (u multiplied by v, w, h, c) is considered as a sub-aperture image, and u multiplied by v is considered as a view angle image batch, which is also called a space subspace; similarly, the light field data with the dimension represented by (w×h, u, v, c) is an angular subspace, the light field data with the dimension represented by (u×h, w, v, c), (w×v, u, h, c), (v×h, u, w, c), (w×u, h, v, c) is an EPI subspace, six subspaces are included, and the operation of decomposing the kernel is to perform 2D convolution operation on the 2 and 3-dimensional data after the sequential light field data is reduced to each subspace, which can be specifically expressed by the following formula:
DKConv=k h,v (k u,w (k u,h (k w,v (k u,v (k w,h (L))))))
where L represents the original five-dimensional data of the light field, k w,h Representing convolving the spatial subspace, also referred to as spatial convolution; k (k) u,v The representation convolves the angular subspace, which also becomes the angular convolution, k w,v 、k u,h 、k u,w 、k h,v Representing convolving the EPI subspace, DKConv representing the decomposition kernel;
the specific convolution operation can be expressed by the following formula:
wherein ,representing the 2D convolution operation, W represents the weight of the corresponding 2D convolution, +.>Representing the corresponding subspace of the original light field five-dimensional data L deformation, and f (·) represents the Relu activation function operation; />Representing the deformation of the light field data and in the dimension (d 1 ,d 2 ) And performing convolution operation.
3. The multi-view information interaction-based light field super-resolution network generation method according to claim 1, wherein: the structure of the channel attention residual error module based on the decomposition core is as follows: firstly, a 1 multiplied by 1 decomposition kernel convolution is put in front of the channel attention residual module as a bottleneck layer, the input channel number of the channel attention residual module is changed to a preset size, the channel attention residual module is sent into the channel attention module after passing through a characteristic extraction layer of the 3 multiplied by 3 decomposition kernel convolution, the channel attention force diagram is obtained through calculation, the obtained channel attention weight is multiplied to the input of the channel attention module, and finally, the channel attention force diagram and the input of the whole channel attention residual module are subjected to addition link to obtain the final output of the channel attention residual module.
4. A multi-view information interaction-based light field super-resolution network generation method according to claim 3, characterized in that: the working process of the channel attention module is as follows: the input light field data passes through a global pooling layer, and the pooling layer is used for calculating the attention of a channel, so that the pooling layer can act on all the dimensions except the channel in the data, namely four dimensions of a space dimension and an angle dimension; then the dimension number is reduced, only one dimension of the channel is reserved, and the channel dimension of the data is reduced to C through a downsampling 1×1 convolution and an upsampling 1×1 convolution r And then restoring to C, realizing the weight of each channel of the predictive feature map through the step, setting a PReLU activation function between two convolutions, and finally expanding the data dimension to a form matched with input data after the sigmoid activation function, so as to obtain the channel attention map of the input feature map.
5. The multi-view information interaction-based light field super-resolution network generation method according to claim 1, wherein: the working process of the feature extraction is as follows: after the light field data is input into the network generator, the number of channels of the data is expanded to a preset size through 3 x 3 space convolution, and then the data sequentially passes through a feature extraction part which is formed by connecting 8 bottleneck layers and channel attention residual modules in series, wherein the input of each channel attention residual module is formed by the connection of the generator input and the output before the channel attention residual module in the channel dimension, namely, the structure of input dense residual connection formed by original image connection and dense connection.
6. The multi-view information interaction-based light field super-resolution network generation method according to claim 1, wherein: the super-resolution light field image generation step is as follows: firstly, carrying out 3X 3 angle convolution on the obtained feature images, fusing and reducing a large number of feature images of channels with dense residual links to a preset channel size in the channel dimension so as to fuse information among all view images, and then, expanding the channel number to the square multiple of the up-sampling multiple of the network input image channel through 3X 3 space convolution, namely a 2 c, preparing for sub-pixel convolution operation, and finally, orderly arranging and fusing pixels at the same position of the feature images of all channels through sub-pixel convolution to realize spatial resolution up-sampling of the light field image to a times of the input light field image.
7. The multi-view information interaction-based light field super-resolution network generation method according to claim 1, wherein: the working process of the discriminator is as follows: the super-resolution light field image and the original high-resolution light field image output by the generator are respectively sent to the discriminator, firstly, image features are extracted through 8 layers of decomposition kernel convolution, wherein all decomposition kernels are 3×3 in size, the output channels of the singular decomposition kernel convolution layers are twice as large as the input channels except the first layer of decomposition kernels, the step length is 1, the output channels and the input channels of the double decomposition kernel convolution layers are kept consistent, but the step length is 2, so that the size of the feature image is reduced to half of the original size when the number of channels is doubled each time, a BN layer is connected after all decomposition kernels except the first layer of decomposition kernels, and finally, discrimination results are output through an average pooling layer and two 1×1 convolutions; the closer the value of the discrimination result is to 1, the larger the probability that the representative discriminator considers that the input image is a high-resolution image, the closer the discrimination value is to 0, the larger the probability that the representative discriminator considers that the input image is a generated super-resolution image, and the discrimination result is fed back to the generator, thereby guiding the improvement of the generator.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310410553.3A CN116957057A (en) | 2023-04-16 | 2023-04-16 | Multi-view information interaction-based light field image super-resolution network generation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310410553.3A CN116957057A (en) | 2023-04-16 | 2023-04-16 | Multi-view information interaction-based light field image super-resolution network generation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116957057A true CN116957057A (en) | 2023-10-27 |
Family
ID=88460848
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310410553.3A Pending CN116957057A (en) | 2023-04-16 | 2023-04-16 | Multi-view information interaction-based light field image super-resolution network generation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116957057A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118071603A (en) * | 2024-04-19 | 2024-05-24 | 浙江优众新材料科技有限公司 | Light field image super-resolution method, device and medium for space angle information interaction |
-
2023
- 2023-04-16 CN CN202310410553.3A patent/CN116957057A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118071603A (en) * | 2024-04-19 | 2024-05-24 | 浙江优众新材料科技有限公司 | Light field image super-resolution method, device and medium for space angle information interaction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113362223B (en) | Image super-resolution reconstruction method based on attention mechanism and two-channel network | |
CN110706157B (en) | Face super-resolution reconstruction method for generating confrontation network based on identity prior | |
US20210390723A1 (en) | Monocular unsupervised depth estimation method based on contextual attention mechanism | |
CN110415170B (en) | Image super-resolution method based on multi-scale attention convolution neural network | |
CN111275618B (en) | Depth map super-resolution reconstruction network construction method based on double-branch perception | |
CN109509152B (en) | Image super-resolution reconstruction method for generating countermeasure network based on feature fusion | |
CN110689482B (en) | Face super-resolution method based on supervised pixel-by-pixel generation countermeasure network | |
CN115222601A (en) | Image super-resolution reconstruction model and method based on residual mixed attention network | |
CN113298717B (en) | Medical image super-resolution reconstruction method based on multi-attention residual feature fusion | |
CN110889895A (en) | Face video super-resolution reconstruction method fusing single-frame reconstruction network | |
CN113096017A (en) | Image super-resolution reconstruction method based on depth coordinate attention network model | |
CN112767253B (en) | Multi-scale feature fusion binocular image super-resolution reconstruction method | |
CN114331830B (en) | Super-resolution reconstruction method based on multi-scale residual error attention | |
CN112241939B (en) | Multi-scale and non-local-based light rain removal method | |
CN116523740B (en) | Infrared image super-resolution method based on light field | |
Lin et al. | Steformer: Efficient stereo image super-resolution with transformer | |
CN112950475A (en) | Light field super-resolution reconstruction method based on residual learning and spatial transformation network | |
CN116957057A (en) | Multi-view information interaction-based light field image super-resolution network generation method | |
CN116168067B (en) | Supervised multi-modal light field depth estimation method based on deep learning | |
CN115393186A (en) | Face image super-resolution reconstruction method, system, device and medium | |
CN114757862B (en) | Image enhancement progressive fusion method for infrared light field device | |
CN114897680B (en) | Angle super-resolution method for fusing light field sub-aperture image and macro-pixel image | |
CN114049251A (en) | Fuzzy image super-resolution reconstruction method and device for AI video analysis | |
CN115660955A (en) | Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion | |
CN114926337A (en) | Single image super-resolution reconstruction method and system based on CNN and Transformer hybrid network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |