CN112750076B - Light field multi-view image super-resolution reconstruction method based on deep learning - Google Patents

Light field multi-view image super-resolution reconstruction method based on deep learning Download PDF

Info

Publication number
CN112750076B
CN112750076B CN202010284067.8A CN202010284067A CN112750076B CN 112750076 B CN112750076 B CN 112750076B CN 202010284067 A CN202010284067 A CN 202010284067A CN 112750076 B CN112750076 B CN 112750076B
Authority
CN
China
Prior art keywords
feature
light field
view
resolution
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010284067.8A
Other languages
Chinese (zh)
Other versions
CN112750076A (en
Inventor
赵圆圆
李浩天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yimu Shanghai Technology Co ltd
Original Assignee
Yimu Shanghai Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yimu Shanghai Technology Co ltd filed Critical Yimu Shanghai Technology Co ltd
Priority to CN202010284067.8A priority Critical patent/CN112750076B/en
Publication of CN112750076A publication Critical patent/CN112750076A/en
Application granted granted Critical
Publication of CN112750076B publication Critical patent/CN112750076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10052Images from lightfield camera
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

A light field multi-view image super-resolution reconstruction method based on deep learning is characterized in that a training set of high-resolution and low-resolution image pairs is constructed by adopting multi-view images which are distributed in an NxN array shape and obtained from a light field camera or a light field camera array; constructing a multilayer characteristic extraction network from an NxN light field multi-view image array to an NxN light field multi-view characteristic image; stacking the characteristic images and constructing a characteristic fusion and enhancement multilayer convolution network to obtain 4D light field structural characteristics which can be used for reconstructing light field multi-view images; constructing an up-sampling module to obtain a nonlinear mapping relation from the 4D light field structural characteristics to the high-resolution NxN light field multi-view image; constructing a loss function based on the multi-scale feature fusion network, training the loss function, and finely adjusting network parameters; and inputting the low-resolution NxN light field multi-view image into the trained network to obtain the high-resolution NxN light field multi-view image.

Description

Light field multi-view image super-resolution reconstruction method based on deep learning
Technical Field
The invention relates to the technical field of image processing, in particular to a light field multi-view image super-resolution reconstruction method based on deep learning.
Background
The light field camera can capture the spatial position and the incident angle of light rays at the same time, however, the recorded light field has a trade-off restriction relation between spatial resolution and angular resolution, and the limited spatial resolution of the multi-view image limits the application range of the light field camera to a certain extent. The camera array also limits the development of the fields of 3D light field display, 3D modeling, 3D measurement and the like due to the constraints of cost and resolution. With the continuous development of the field of image processing, the demand of the super-resolution technology for the optical field multi-view images needs to be met.
In recent years, development and progress of the neural convolutional network provide a better solution for super-resolution. The appearance of the neural convolutional network as a nonlinear optimization method provides a plurality of good solutions and ideas for super-resolution of conventional images. However, in the existing scheme, the light field multi-view super-resolution method based on the convolutional neural network only uses the mutual relation among the light field multi-view based on the characteristics of the light field multi-view image, and does not consider the characteristic loss of image texture information under the multi-scale; secondly, the super-resolution effect is not good because the characteristic information hidden in the 4D light field is used too simply.
Disclosure of Invention
The invention provides a light field multi-view image super-resolution reconstruction method based on deep learning, and aims to solve the problem that the existing super-resolution method for light field multi-view images cannot meet technical indexes.
In one embodiment of the present invention, a light field multi-view image super-resolution reconstruction method based on deep learning includes the following steps:
a1, constructing a training set of high-resolution and low-resolution image pairs by using multi-view images which are acquired from a light field camera or a light field camera array and distributed in an NxN array shape;
a2, constructing a multilayer characteristic extraction network from an NxN light field multi-view image array to an NxN light field multi-view characteristic image;
a3, stacking the characteristic images, constructing a characteristic fusion and enhancement multilayer convolution network, and obtaining 4D light field structural characteristics which can be used for reconstructing light field multi-view images;
a4, constructing an up-sampling module to obtain a nonlinear mapping relation from the 4D light field structural characteristics to the high-resolution N multiplied by N light field multi-view image;
a5, constructing a loss function based on the multi-scale feature fusion network, training, and finely adjusting network parameters; and A6, inputting the low-resolution NxN light field multi-view image into the trained network to obtain the high-resolution NxN light field multi-view image.
The light field multi-view image super-resolution method based on the multi-scale fusion features provided by the embodiment of the invention has the following advantages:
1. the characteristics of the light field multi-view images are fully utilized, the inherent structure information in the 4D light field is explored through the multi-scale feature extraction module, then the extracted texture information is fused and enhanced through the fusion module, and finally the super-resolution of the light field multi-view image array is achieved through the up-sampling module.
2. The super-resolution result of the light field multi-view image can be used for light field depth estimation, more clues can be provided for shielding or edge areas, and the calculation accuracy of the disparity map is enhanced to a certain extent.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description, which proceeds with reference to the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
FIG. 1 is a flow chart of a super-resolution method for a light field multi-view image based on multi-scale fusion features according to one embodiment of the invention.
Fig. 2 is a schematic structural diagram of a super-resolution network of a light-field multi-view image according to one embodiment of the present invention.
FIG. 3 is a schematic diagram of an atomic space pyramid pooling block consisting of atomic hole convolutions and atomic hole convolutions of different expansion rates in accordance with one embodiment of the present invention.
Fig. 4 is a schematic diagram of a network structure of a fusion block according to one embodiment of the present invention.
Fig. 5 is a comparison table of bicubic interpolation according to one of the embodiments of the present invention and the method in the embodiment of the present invention under two evaluation indexes of PSNR and SSIM on three images with different synthetic data sets.
Fig. 6 is a comparison table of bicubic interpolation according to one of the embodiments of the present invention and the method according to the embodiment of the present invention under two evaluation indexes of PSNR and SSIM on three images with different real data sets.
Detailed Description
According to one or more embodiments, as shown in fig. 1, a light field multi-view image super-resolution method based on multi-scale fusion features includes the following steps:
a1, constructing a training set of a high-resolution and low-resolution image pair by using a light field camera multi-view image or a light field camera array image (N multiplied by N array-shaped distributed multi-view images);
a2, constructing a multilayer characteristic extraction network from the NxN light field multi-view image array to the NxN light field multi-view characteristic image;
a3, stacking the characteristic images, constructing a characteristic fusion and enhancement multilayer convolution network, and obtaining 4D light field structural characteristics which can be used for reconstructing light field multi-view images;
a4, constructing an up-sampling module to obtain a nonlinear mapping relation from the 4D light field structural characteristics to the high-resolution N multiplied by N light field multi-view image;
a5, constructing a loss function based on the multi-scale feature fusion network, training, and finely adjusting network parameters;
and A6, inputting the low-resolution NxN light field multi-view image into the trained network to obtain the high-resolution NxN light field multi-view image.
According to one or more embodiments, the specific process of constructing the training set of high-resolution and low-resolution image pairs by using the light field camera multi-view images or the light field camera array images (N × N array-like distributed multi-view images) in step A1 is as follows:
step A1.1, firstly, for the multi-view image G distributed in the form of N × N array HR Performing bicubic interpolation for 2-fold down-sampling to obtain low-resolution NxN light-field multi-view image G LR
Step A1.2, then, for the low-resolution light field multi-view image G LR Is cut into small blocks with the space size of M multiplied by M pixels by the step length of K pixels, and the high-resolution light field multi-view image G HR Is also cut into small blocks with the size of 2 Mx 2M pixels correspondingly;
step A1.3, normalization and regularization processing are respectively carried out on the two light field multi-view images, and the value of each pixel is in the range of [0,1], so that input data and real data of the deep learning network model in the embodiment are formed.
According to one or more embodiments, as shown in fig. 2, a specific process of constructing the multi-layer feature extraction network from the N × N light field multi-view image array to the N × N light field multi-view feature image in step A2 is as follows:
step A2.1, extracting low-level features of the multi-view images in the low-resolution light field through 1 conventional convolution and 1 residual block (ResB);
step a2.2, performing multi-scale feature extraction and feature fusion on the extracted low-level features by using a residual block and a residual block which alternately appear twice (residual aperture spatial imaging, resASPP), so as to obtain the medium-level features of each light-field multi-view image.
Wherein the ResASPP block is a block of ASPP with 3 identical structure parameters concatenated and added to the upstream input in the form of a residual; as shown in fig. 3, an atomic spatial pyramid pooling block (ASPP) performs multi-scale feature extraction on upstream input by using atomic hole convolutions parallel to each other and having different expansion rates; in each ASPP block, first 3 atomic hole convolutions were performed to feature the upstream input with a dilation rate of d =1,4,8, respectively, and then the resulting multi-scale features were fused by 1 × 1 convolution kernel.
According to one or more embodiments, the specific process of stacking the feature images and constructing a feature fusion and enhancement multilayer convolution network in the step A3 to obtain the 4D light field structural features that can be used for reconstructing the light field multi-view image is as follows:
step A3.1, multiscale feature map array Q 0 ∈R NH×NW×C Each view of (a) is stacked on channel C in order from top left to bottom right, where H denotes the number of columns of multi-view images and W denotes the number of rows of images; n represents the number of multi-view images in a single direction, and the total number is N × N; c denotes the number of channels of the image. Thereby obtaining a characteristic diagram Q epsilon R H×W×(N×N×C)
Step A3.2, the characteristic diagram Q epsilon after stacking belongs to R H×W×(N×N×C) Will be sent as input to the global feature fusion module. Firstly, performing feature re-extraction on the stacked multi-scale features through 3 conventional convolutionsThen, carrying out feature fusion through 1 residual block;
step a3.3, then enter the fusion block to achieve feature enhancement. The fusion block can accumulate more texture detail information on the original characteristics by extracting the angle characteristics in the 4D light field. The enhanced features are sent to 4 cascaded residual blocks for full feature fusion, and finally 4D light field structural features which can be used for super-resolution reconstruction of light field images are generated.
The fusion block is used for performing feature fusion and enhancement on the extracted multi-scale features, and adopts a network structure shown in fig. 4. The central perspective image may be transformed by some "warping" to generate other surrounding perspective images, and vice versa. The process of generating the peripheral view from the central view can be described mathematically as:
G s',t' =M st→s't' ·W st→s't' ·G s,t +N st→s't'
in the formula, G s,t Representing central view angle image, G s',t' Representing other peripheral view images, W st→s't' Is a "warp matrix", and N st→s't' Is a view generated after the warping transformation and a multi-view image G of the original view s',t' An error term between; m st→s't' Is a "mask" matrix used to remove the effects of the occlusion problem described above.
As shown in FIG. 4, the peripheral view feature Q in the NxN feature map array s',t' Through 'warping transformation' W s't'→st Central view feature Q 'may be generated separately' s,t As indicated by the feature block labeled (1). Likewise, central perspective feature Q s,t Subjected to a "warp transformation" W st→s't' The peripheral viewing angle characteristics W can also be generated accordingly st→s't' As shown in the feature block labeled (2) in fig. 4. The foregoing process can be expressed as:
Figure BDA0002447833420000051
in the formula (I), the compound is shown in the specification,
Figure BDA0002447833420000052
is a batch matrix multiplication. Then, the module performs "mask" processing on the feature blocks (1) and (2) respectively to deal with the occlusion problem existing between different views. The method for acquiring the mask matrix comprises the following steps: obtaining an absolute value of an error item between the generated view and the original view, wherein the larger the absolute value is, the region is indicated as an occlusion region, and specifically:
Figure BDA0002447833420000053
wherein T =0.9 × max (| Q' s,t -Q s,t || 1 ) For empirical thresholds set in the algorithm, a "mask" matrix M st→s't' Is derived from M s't'→s,t Similarly. Then, the occlusion regions in the feature blocks (1) and (2) are filtered:
Figure BDA0002447833420000054
in the formula (I), the compound is shown in the specification,
Figure BDA0002447833420000055
and
Figure BDA0002447833420000056
respectively are the feature blocks obtained after the mask processing. Since N = N × N-1 central view angle feature maps are formed in the above process, the image processing method is suitable for the image processing method
Figure BDA0002447833420000057
Normalization processing is carried out to obtain a characteristic diagram (3) shown in figure 4
Figure BDA0002447833420000058
Figure BDA0002447833420000059
In the formula, k is an index value when other views except the center view in the N × N feature map array are arranged from top left to bottom right;
Figure BDA00024478334200000510
it represents the k-th other surrounding view feature map generated from the center view and processed by the "mask" process. Further, will
Figure BDA00024478334200000511
And replacing the feature map of the central position with the feature map of the label (3) to obtain a feature block (4) after global fusion. The feature block (4) is added to the original input multi-scale features to realize feature enhancement, and finally a feature block (5) subjected to feature fusion and enhancement is obtained.
According to one or more embodiments, the specific process of constructing the upsampling module in step A4 to obtain the non-linear mapping relationship from the 4D light field structural feature to the high-resolution N × N light field multi-view image is as follows:
step A4.1, using sub-pixel convolution, first generating r from the input feature map with channel number C 2 A characteristic diagram with the number of channels being C;
step A4.2, then the obtained number of channels is r 2 The xc profile is sampled and thus generates a high resolution profile with a resolution r times.
And step A4.3, sending the high-resolution feature map to 1 conventional convolutional layer for feature fusion, and finally generating the super-resolution light field multi-view image array.
According to one or more embodiments, step A5 is to construct a loss function based on the multi-scale feature fusion network and train the loss function, and the specific process of fine tuning the network parameters is as follows:
in the training process, the super-resolved light field multi-view images are respectively compared with the actual high-resolution light field multi-view images one by one, and a leakage correction linear unit (leak ReLU) with a leakage factor of 0.1 is adopted by a network as an activation function so as to avoid the condition that information transmission is not carried out on neurons in the training process:
Figure BDA0002447833420000061
wherein u, v respectively represent the positions of the multi-view images in the N × N arrayed array in the lateral and longitudinal directions, respectively; s, t represent the position of the multi-view image pixel in the x-axis direction and the y-axis direction of the image, respectively.
And step A6, specifically, inputting the low-resolution NxN light field multi-view image into the trained network to obtain the high-resolution NxN light field multi-view image.
The invention is discussed in terms of one or more embodiments implementing the method.
Training was performed using the university of Heidelberg light field dataset in Germany and the Lytro Illum light field camera dataset of Stanford, using 5 x 5 number of light field multi-view images, and the training data was sliced into 64 x 64 pixel low resolution images and 128 x 128 pixel high resolution image patches in 32 pixel steps. Data enhancement is performed by randomly flipping the image horizontally and vertically. The built neural network is trained in a Pythrch frame, and the model initializes the weight of each convolution layer by using an Adam optimization method and an Xaviers method. The initial learning rate of the model was set to 2 x 10-4, decayed 0.5 times every 20 cycles, and the training was stopped after 80 cycles.
And carrying out comparative analysis on the trained network on the synthetic data set and the real data set respectively.
Fig. 5 shows a comparison table of bicubic interpolation and the method of the present invention under two evaluation indexes of PSNR and SSIM on three images with different synthetic data sets.
Fig. 6 shows a comparison table of bicubic interpolation and the method of the present invention under two evaluation indexes of PSNR and SSIM on three images with different real data sets.
The higher the PSNR and SSIM parameter values, the better the super-resolution image effect. The specific implementation example results show that the super-resolution effect of the method is obvious.
It should be understood that, in the embodiment of the present invention, the term "and/or" is only one kind of association relation describing an associated object, and means that three kinds of relations may exist. For example, a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electrical, mechanical or other form of connection.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A light field multi-view image super-resolution reconstruction method based on deep learning is characterized by comprising the following steps:
a1, constructing a training set of a high-resolution and low-resolution image pair by adopting multi-view images which are obtained from a light field camera or a light field camera array and distributed in an NxN array shape;
a2, constructing a multilayer characteristic extraction network from the NxN light field multi-view image array to the NxN light field multi-view characteristic image;
a3, stacking the characteristic images and constructing a characteristic fusion and enhancement multilayer convolution network to obtain 4D light field structural characteristics for reconstructing light field multi-view images;
a4, constructing an up-sampling module to obtain a nonlinear mapping relation from the 4D light field structural characteristics to the high-resolution N multiplied by N light field multi-view image;
a5, constructing a loss function based on a multi-scale feature fusion network, training, and finely adjusting network parameters;
a6, inputting the low-resolution NxN light field multi-view image into the trained network to obtain a high-resolution NxN light field multi-view image;
the step A2 comprises the following steps: low-resolution light-field multi-view image G LR Low-level feature extraction is achieved via 1 conventional convolution and 1 residual block; performing multi-scale feature extraction and feature fusion on the extracted low-level features by using ResASPP blocks and residual blocks which alternately appear twice, thereby obtaining the medium-level features of each light field multi-view image;
the step A3 comprises the following steps:
step A3.1, multiscale feature map array Q 0 ∈R NH×NW×C Are stacked on channel C in order from top left to bottom right, resulting in a feature map Q e R H×W×(N×N×C)
Step A3.2, the characteristic diagram Q epsilon R after stacking H×W×(N×N×C) The multi-scale features are sent to a global feature fusion module as input, feature re-extraction is carried out on the stacked multi-scale features through 3 conventional convolutions, and feature fusion is carried out through 1 residual block;
step A3.3, entering a fusion block to realize feature enhancement, accumulating more texture detail information on the original features by extracting the angle features in the 4D light field by the fusion block, sending the enhanced features to 4 cascaded residual blocks for full feature fusion, and finally generating 4D light field structure features for super-resolution reconstruction of the light field image;
peripheral view feature Q in an NxN feature map array s',t' Through 'warping transformation' W s't'→st Separately generating center view feature Q' s,t Center view feature Q s,t Through 'warping transformation' W st→s't' The peripheral view angle feature W is also generated accordingly st→s't' The process is represented as:
Figure FDA0003869386610000021
in the formula (I), the compound is shown in the specification,
Figure FDA0003869386610000022
in order to batch matrix multiplication and then respectively carry out mask processing to solve the occlusion problem existing between different visual angles, the method for acquiring the mask matrix comprises the following steps: obtaining an absolute value of an error item between the generated view and the original view, wherein the larger the absolute value is, the region is indicated as an occlusion region, and specifically:
Figure FDA0003869386610000023
wherein T =0.9 × max (| Q' s,t -Q s,t || 1 ) For empirical thresholds set in the algorithm, a "mask" matrix M st→s't' Is obtained by the method of (1) and M s't'→s,t Similarly, the occlusion regions are then filtered:
Figure FDA0003869386610000024
in the formula (I), the compound is shown in the specification,
Figure FDA0003869386610000025
and
Figure FDA0003869386610000026
respectively feature blocks obtained after mask processing, and N = N × N-1 central view angle feature maps are formed in the above process, so that the feature blocks are processed into a pair
Figure FDA0003869386610000027
Normalization processing is performed, and the characteristic diagram is marked as (3)
Figure FDA0003869386610000028
Figure FDA0003869386610000029
In the formula, k is an index value when other views except the center view in the N × N feature map array are arranged from top left to bottom right;
Figure FDA00038693866100000210
then the k-th other surrounding view characteristic diagram generated by the central view and obtained after the mask processing is shown
Figure FDA00038693866100000211
Feature map of center positionAnd (4) obtaining a feature block (4) after global fusion by changing the feature block into the feature graph of the label (3), and accumulating the feature block (4) to the originally input multi-scale features to realize feature enhancement, thereby finally obtaining a feature block (5) after feature fusion and enhancement.
2. The method of claim 1, wherein the step A1 further comprises:
step A1.1, recording the multi-view image with NxN array-shaped distribution as a high-resolution light field multi-view image G HR To G HR Performing bicubic interpolation and performing 2-fold down-sampling to obtain a low-resolution NxN light-field multi-view image G LR
Step A1.2, low resolution light field Multi-View image G LR Is cut into small blocks with the space size of M multiplied by M pixels by the step length of K pixels, and the high-resolution light field multi-view image G HR Is also correspondingly cut into small blocks with the size of 2 Mx 2M pixels;
step A1.3, for G respectively HR 、G LR The two light field multi-view images are normalized and regularized, and each pixel takes the value of 0,1]Thereby constituting input data and real data of the deep learning network model.
3. The method of claim 2, wherein the step A2 further comprises:
the ResASPP block is formed by cascading 3 ASPP blocks with the same structure parameters and adding the same ASPP blocks into an upstream input in a residual error mode;
the ASPP block adopts atom hole convolutions which are parallel to each other and have different expansion rates to carry out multi-scale feature extraction on upstream input;
in each ASPP block, 3 atomic hole convolutions feature the upstream input at a dilation rate of d =1,4,8, respectively, and the resulting multi-scale features are then fused by 1 × 1 convolution kernel.
4. The method of claim 1, wherein the step A4 further comprises:
step A4.1, using sub-pixel convolution, first generating r from the input feature map with channel number C 2 A characteristic diagram with the number of channels being C;
step A4.2, then the obtained number of channels is r 2 The characteristic map of x C is sampled to generate a high-resolution characteristic map with the resolution of r times;
and A4.3, sending the high-resolution feature map to 1 conventional convolutional layer for feature fusion, and finally generating a super-resolution light field multi-view image array.
5. The method according to claim 4, wherein the specific process of step A5 is:
in the training process, the super-resolved light field multi-view images are respectively compared with the actual high-resolution light field multi-view images one by one, and the network adopts a leakage correction linear unit with a leakage factor of 0.1 as an activation function to avoid the condition that information transmission is not carried out on neurons in the training process:
Figure FDA0003869386610000031
6. a light field camera system is characterized in that the system is based on deep learning for a light field multi-view image super-resolution reconstruction method, and the super-resolution reconstruction method comprises the following steps:
a1, constructing a training set of high-resolution and low-resolution image pairs by using multi-view images which are obtained from a light field camera and distributed in an NxN array shape;
a2, constructing a multilayer characteristic extraction network from an NxN light field multi-view image array to an NxN light field multi-view characteristic image;
a3, stacking the characteristic images, constructing a characteristic fusion and enhancement multilayer convolution network, and obtaining 4D light field structural characteristics which can be used for reconstructing light field multi-view images;
a4, constructing an up-sampling module to obtain a nonlinear mapping relation from the 4D light field structural characteristics to the high-resolution N multiplied by N light field multi-view image;
a5, constructing a loss function based on a multi-scale feature fusion network, training, and finely adjusting network parameters;
a6, inputting the low-resolution NxN light field multi-view image into the trained network to obtain a high-resolution NxN light field multi-view image;
the step A2 comprises the following steps: low-resolution light-field multi-view image G LR Low-level feature extraction is achieved via 1 conventional convolution and 1 residual block; performing multi-scale feature extraction and feature fusion on the extracted low-level features by using ResASPP blocks and residual blocks which alternately appear twice, so as to obtain the middle-level features of each light field multi-view image;
the step A3 comprises the following steps:
step A3.1, multiscale feature map array Q 0 ∈R NH×NW×C Is stacked on channel C in order from top left to bottom right, resulting in a feature map Q e R H×W×(N×N×C)
Step A3.2, the characteristic diagram Q epsilon R after stacking H×W×(N×N×C) The multi-scale features are sent to a global feature fusion module as input, feature re-extraction is carried out on the stacked multi-scale features through 3 conventional convolutions, and feature fusion is carried out through 1 residual block;
step A3.3, entering a fusion block to realize feature enhancement, accumulating more texture detail information on the original features by extracting the angle features in the 4D light field by the fusion block, sending the enhanced features to 4 cascaded residual blocks for full feature fusion, and finally generating 4D light field structure features for super-resolution reconstruction of the light field image;
peripheral view feature Q in an NxN feature map array s',t' Subjected to a "warp transformation" W s't'→st Separately generating center view feature Q' s,t Center view feature Q s,t Through 'warping transformation' W st→s't' The peripheral view angle feature W is also generated accordingly st→s't' The process is represented as:
Figure FDA0003869386610000041
in the formula (I), the compound is shown in the specification,
Figure FDA0003869386610000042
in order to batch matrix multiplication and then respectively carry out mask processing to solve the occlusion problem existing between different visual angles, the method for acquiring the mask matrix comprises the following steps: obtaining an absolute value of an error item between the generated view and the original view, wherein the larger the absolute value is, the region is indicated as an occlusion region, and specifically:
Figure FDA0003869386610000043
wherein T =0.9 × max (| Q' s,t -Q s,t || 1 ) For empirical thresholds set in the algorithm, a "mask" matrix M st→s't' Is derived from M s't'→s,t Similarly, the occlusion regions are then filtered:
Figure FDA0003869386610000051
in the formula (I), the compound is shown in the specification,
Figure FDA0003869386610000052
and
Figure FDA0003869386610000053
the feature blocks obtained after the mask processing are respectively subjected to the above processes, and N = N × N-1 central view angle feature maps are formed, so that the feature blocks are aligned
Figure FDA0003869386610000054
Normalization processing is performed, and the characteristic diagram is marked as (3)
Figure FDA0003869386610000055
Figure FDA0003869386610000056
In the formula, k is an index value when other views except the center view in the N × N feature map array are arranged from top left to bottom right;
Figure FDA0003869386610000057
then the k-th other surrounding view characteristic diagram generated by the central view and obtained after the mask processing is shown
Figure FDA0003869386610000058
And replacing the feature diagram of the central position with the feature diagram of the label (3) to obtain a feature block (4) after global fusion, and accumulating the feature block (4) to the originally input multi-scale features to realize feature enhancement, thereby finally obtaining a feature block (5) after feature fusion and enhancement.
7. A light field camera array system is characterized in that the system is based on deep learning for a light field multi-view image super-resolution reconstruction method, and the super-resolution reconstruction method comprises the following steps:
a1, constructing a training set of high-resolution and low-resolution image pairs by using multi-view images which are obtained from a light field camera array and distributed in an NxN array shape;
a2, constructing a multilayer characteristic extraction network from the NxN light field multi-view image array to the NxN light field multi-view characteristic image;
a3, stacking the characteristic images and constructing a characteristic fusion and enhancement multilayer convolution network to obtain 4D light field structural characteristics for reconstructing light field multi-view images;
a4, constructing an up-sampling module to obtain a nonlinear mapping relation from the 4D light field structural characteristics to the high-resolution N multiplied by N light field multi-view image;
a5, constructing a loss function based on a multi-scale feature fusion network, training, and finely adjusting network parameters;
a6, inputting the low-resolution NxN light field multi-view image into the trained network to obtain a high-resolution NxN light field multi-view image;
the step A2 comprises the following steps: low-resolution light-field multi-view image G LR Implementing low-level feature extraction via 1 regular convolution and 1 residual block; performing multi-scale feature extraction and feature fusion on the extracted low-level features by using ResASPP blocks and residual blocks which alternately appear twice, so as to obtain the middle-level features of each light field multi-view image;
the step A3 comprises the following steps:
step A3.1, multiscale feature map array Q 0 ∈R NH×NW×C Are stacked on channel C in order from top left to bottom right, resulting in a feature map Q e R H×W×(N×N×C)
Step A3.2, the characteristic diagram Q epsilon after stacking belongs to R H×W×(N×N×C) The multi-scale features are sent to a global feature fusion module as input, feature re-extraction is carried out on the stacked multi-scale features through 3 conventional convolutions, and feature fusion is carried out through 1 residual block;
step A3.3, entering a fusion block to realize feature enhancement, accumulating more texture detail information on the original features by extracting the angle features in the 4D light field by the fusion block, sending the information to 4 cascaded residual blocks for feature sufficient fusion through the enhanced features, and finally generating 4D light field structural features for super-resolution reconstruction of the light field image;
peripheral view feature Q in an NxN feature map array s',t' Through 'warping transformation' W s't'→st Separately generating center view feature Q' s,t Center view feature Q s,t Subjected to a "warp transformation" W st→s't' The surrounding view angle feature W is also generated accordingly st→s't' The process is represented as:
Figure FDA0003869386610000061
in the formula (I), the compound is shown in the specification,
Figure FDA0003869386610000062
in order to batch matrix multiplication and then respectively carry out mask processing to solve the occlusion problem existing between different visual angles, the method for acquiring the mask matrix comprises the following steps: obtaining an absolute value of an error term between the generated view and the original view, wherein the larger the absolute value is, the area is indicated as an occlusion area, and specifically:
Figure FDA0003869386610000063
wherein, T =0.9 × max (| Q' s,t -Q s,t || 1 ) For empirical thresholds set in the algorithm, the "mask" matrix M st→s't' Is obtained by the method of (1) and M s't'→s,t Similarly, the occlusion regions are then filtered:
Figure FDA0003869386610000064
in the formula (I), the compound is shown in the specification,
Figure FDA0003869386610000065
and
Figure FDA0003869386610000066
the feature blocks obtained after the mask processing are respectively subjected to the above processes, and N = N × N-1 central view angle feature maps are formed, so that the feature blocks are aligned
Figure FDA0003869386610000067
Normalized, feature map labeled (3)
Figure FDA0003869386610000068
Figure FDA0003869386610000069
In the formula, k is an index value when other views except the center view in the N × N feature map array are arranged from top left to bottom right;
Figure FDA00038693866100000610
it means that the k-th other surrounding view feature map generated from the center view and processed by the "mask" will be obtained
Figure FDA00038693866100000611
And replacing the feature diagram of the central position with the feature diagram of the label (3) to obtain a feature block (4) after global fusion, and accumulating the feature block (4) to the originally input multi-scale features to realize feature enhancement, thereby finally obtaining a feature block (5) after feature fusion and enhancement.
8. A deep learning model construction method for super-resolution reconstruction of light field multi-view images is characterized by comprising the following steps:
a1, constructing a training set of high-resolution and low-resolution image pairs by using multi-view images which are acquired from a light field camera or a light field camera array and distributed in an NxN array shape;
a2, constructing a multilayer characteristic extraction network from the NxN light field multi-view image array to the NxN light field multi-view characteristic image;
a3, stacking the characteristic images, constructing a characteristic fusion and enhancement multilayer convolution network, and obtaining 4D light field structural characteristics which can be used for reconstructing light field multi-view images;
a4, constructing an up-sampling module to obtain a nonlinear mapping relation from the 4D light field structural characteristics to the high-resolution N multiplied by N light field multi-view image;
a5, constructing a loss function based on a multi-scale feature fusion network, training, and finely adjusting network parameters;
the step A2 comprises the following steps: low-resolution light-field multi-view image G LR Low-level feature extraction is achieved via 1 conventional convolution and 1 residual block; performing multi-scale feature extraction and feature fusion on the extracted low-level features by using ResASPP blocks and residual blocks which alternately appear twice, so as to obtain the middle-level features of each light field multi-view image;
the step A3 comprises the following steps:
step A3.1, multiscale feature map array Q 0 ∈R NH×NW×C Are stacked on channel C in order from top left to bottom right, resulting in a feature map Q e R H×W×(N×N×C)
Step A3.2, the characteristic diagram Q epsilon R after stacking H×W×(N×N×C) The multi-scale features are sent to a global feature fusion module as input, feature re-extraction is carried out on the stacked multi-scale features through 3 conventional convolutions, and feature fusion is carried out through 1 residual block;
step A3.3, entering a fusion block to realize feature enhancement, accumulating more texture detail information on the original features by extracting the angle features in the 4D light field by the fusion block, sending the information to 4 cascaded residual blocks for feature sufficient fusion through the enhanced features, and finally generating 4D light field structural features for super-resolution reconstruction of the light field image;
peripheral view feature Q in an NxN feature map array s',t' Subjected to a "warp transformation" W s't'→st Separately generating center view feature Q' s,t Center view feature Q s,t Through 'warping transformation' W st→s't' The surrounding view angle feature W is also generated accordingly st→s't' The process is represented as:
Figure FDA0003869386610000081
in the formula (I), the compound is shown in the specification,
Figure FDA0003869386610000082
for batch matrix multiplication, then, respectively performing "mask" processing to deal with the occlusion problem existing between different viewing angles, and obtaining "The method of the mask' matrix is as follows: obtaining an absolute value of an error item between the generated view and the original view, wherein the larger the absolute value is, the region is indicated as an occlusion region, and specifically:
Figure FDA0003869386610000083
wherein T =0.9 × max (| Q' s,t -Q s,t || 1 ) For empirical thresholds set in the algorithm, the "mask" matrix M st→s't' Is obtained by the method of (1) and M s't'→s,t Similarly, the occlusion regions are then filtered:
Figure FDA0003869386610000084
in the formula (I), the compound is shown in the specification,
Figure FDA0003869386610000085
and
Figure FDA0003869386610000086
respectively feature blocks obtained after mask processing, and N = N × N-1 central view angle feature maps are formed in the above process, so that the feature blocks are processed into a pair
Figure FDA0003869386610000087
Normalized, feature map labeled (3)
Figure FDA0003869386610000088
Figure FDA0003869386610000089
In the formula, k is an index value when other views except the center view in the N × N feature map array are arranged from top left to bottom right;
Figure FDA00038693866100000810
then the k-th other surrounding view characteristic diagram generated by the central view and obtained after the mask processing is shown
Figure FDA00038693866100000811
And replacing the feature diagram of the central position with the feature diagram of the label (3) to obtain a feature block (4) after global fusion, and accumulating the feature block (4) to the originally input multi-scale features to realize feature enhancement, thereby finally obtaining a feature block (5) after feature fusion and enhancement.
CN202010284067.8A 2020-04-13 2020-04-13 Light field multi-view image super-resolution reconstruction method based on deep learning Active CN112750076B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010284067.8A CN112750076B (en) 2020-04-13 2020-04-13 Light field multi-view image super-resolution reconstruction method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010284067.8A CN112750076B (en) 2020-04-13 2020-04-13 Light field multi-view image super-resolution reconstruction method based on deep learning

Publications (2)

Publication Number Publication Date
CN112750076A CN112750076A (en) 2021-05-04
CN112750076B true CN112750076B (en) 2022-11-15

Family

ID=75645165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010284067.8A Active CN112750076B (en) 2020-04-13 2020-04-13 Light field multi-view image super-resolution reconstruction method based on deep learning

Country Status (1)

Country Link
CN (1) CN112750076B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256772B (en) * 2021-05-10 2023-08-01 华中科技大学 Double-angle light field high-resolution reconstruction system and method based on visual angle conversion
CN113506213B (en) * 2021-05-24 2024-01-23 北京航空航天大学 Light field image visual angle super-resolution method and device adapting to large parallax range
CN113379602B (en) * 2021-06-08 2024-02-27 中国科学技术大学 Light field super-resolution enhancement method using zero sample learning
CN113538307B (en) * 2021-06-21 2023-06-20 陕西师范大学 Synthetic aperture imaging method based on multi-view super-resolution depth network
CN113938668B (en) * 2021-09-07 2022-08-05 北京邮电大学 Three-dimensional light field display and model training method, device and storage medium
CN115187454A (en) * 2022-05-30 2022-10-14 元潼(北京)技术有限公司 Multi-view image super-resolution reconstruction method and device based on meta-imaging
CN114926339B (en) * 2022-05-30 2023-02-03 北京拙河科技有限公司 Light field multi-view image super-resolution reconstruction method and system based on deep learning
CN115909255B (en) * 2023-01-05 2023-06-06 北京百度网讯科技有限公司 Image generation and image segmentation methods, devices, equipment, vehicle-mounted terminal and medium
CN116823602B (en) * 2023-05-26 2023-12-15 天津大学 Parallax-guided spatial super-resolution reconstruction method for light field image

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734660A (en) * 2018-05-25 2018-11-02 上海通途半导体科技有限公司 A kind of image super-resolution rebuilding method and device based on deep learning
CN109447919A (en) * 2018-11-08 2019-03-08 电子科技大学 In conjunction with the light field super resolution ratio reconstruction method of multi-angle of view and semantic textural characteristics
CN109829855A (en) * 2019-01-23 2019-05-31 南京航空航天大学 A kind of super resolution ratio reconstruction method based on fusion multi-level features figure

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767386A (en) * 2018-12-22 2019-05-17 昆明理工大学 A kind of rapid image super resolution ratio reconstruction method based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734660A (en) * 2018-05-25 2018-11-02 上海通途半导体科技有限公司 A kind of image super-resolution rebuilding method and device based on deep learning
CN109447919A (en) * 2018-11-08 2019-03-08 电子科技大学 In conjunction with the light field super resolution ratio reconstruction method of multi-angle of view and semantic textural characteristics
CN109829855A (en) * 2019-01-23 2019-05-31 南京航空航天大学 A kind of super resolution ratio reconstruction method based on fusion multi-level features figure

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CrossNet: An End-to-End Reference-Based Super Resolution Network Using Cross-Scale Warping;Ferrari V等;《COMPUTER VISION - ECCV 2018》;20180101;全文 *
Learning Parallax Attention for Stereo Image Super-Resolution;Wang LG等;《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019)》;20190319;全文 *
基于多尺度特征损失函数的图像超分辨率重建;徐亮等;《光电工程》;20191115;全文 *

Also Published As

Publication number Publication date
CN112750076A (en) 2021-05-04

Similar Documents

Publication Publication Date Title
CN112750076B (en) Light field multi-view image super-resolution reconstruction method based on deep learning
Wu et al. Light field reconstruction using convolutional network on EPI and extended applications
Jo et al. Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation
CN105488776B (en) Super-resolution image reconstruction method and device
CN109447919B (en) Light field super-resolution reconstruction method combining multi-view angle and semantic texture features
CN107767339B (en) Binocular stereo image splicing method
CN110288524B (en) Deep learning super-resolution method based on enhanced upsampling and discrimination fusion mechanism
Pandey et al. A compendious study of super-resolution techniques by single image
CN113139898A (en) Light field image super-resolution reconstruction method based on frequency domain analysis and deep learning
Unger et al. A convex approach for variational super-resolution
Rohith et al. Paradigm shifts in super-resolution techniques for remote sensing applications
Wu et al. A novel perceptual loss function for single image super-resolution
Deng et al. Multiple frame splicing and degradation learning for hyperspectral imagery super-resolution
Lu et al. Low-rank constrained super-resolution for mixed-resolution multiview video
CN116681592A (en) Image super-resolution method based on multi-scale self-adaptive non-local attention network
Chen et al. High-order relational generative adversarial network for video super-resolution
CN113674154B (en) Single image super-resolution reconstruction method and system based on generation countermeasure network
CN110895790B (en) Scene image super-resolution method based on posterior degradation information estimation
Takeda Locally adaptive kernel regression methods for multi-dimensional signal processing
Gao et al. DRST: Deep Residual Shearlet Transform for Densely Sampled Light Field Reconstruction
CN117291808B (en) Light field image super-resolution processing method based on stream prior and polar bias compensation
Shin et al. Improved Viewing Quality of 3‐D Images in Computational Integral Imaging Reconstruction Based on Round Mapping Model
CN111461987B (en) Network construction method, image super-resolution reconstruction method and system
CN117391959B (en) Super-resolution reconstruction method and system based on multi-granularity matching and multi-scale aggregation
Zhao et al. Activating More Information in Arbitrary-Scale Image Super-Resolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 201100 room 1206, building 1, No. 951, Jianchuan Road, Minhang District, Shanghai

Applicant after: Yimu (Shanghai) Technology Co.,Ltd.

Address before: 201109 room 1103, building 1, 951 Jianchuan Road, Minhang District, Shanghai

Applicant before: Yimu (Shanghai) Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder

Address after: Room 102, 1st Floor, Building 98, No. 1441 Humin Road, Minhang District, Shanghai, 2019; Room 302, 3rd Floor, Building 98; Room 402, 4th Floor, Building 98

Patentee after: Yimu (Shanghai) Technology Co.,Ltd.

Address before: 201100 room 1206, building 1, No. 951, Jianchuan Road, Minhang District, Shanghai

Patentee before: Yimu (Shanghai) Technology Co.,Ltd.

CP02 Change in the address of a patent holder