CN109447919B

CN109447919B - Light field super-resolution reconstruction method combining multi-view angle and semantic texture features

Info

Publication number: CN109447919B
Application number: CN201811328290.7A
Authority: CN
Inventors: 张汝民; 蔡卫彤; 张付停; 陈建文
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2018-11-08
Filing date: 2018-11-08
Publication date: 2022-05-06
Anticipated expiration: 2038-11-08
Also published as: CN109447919A

Abstract

The invention discloses a light field super-resolution reconstruction method combining multi-view angle and semantic texture features, which comprises the following steps: step 1, inputting a light field image with low resolution, giving consideration to the image quality after up-sampling reconstruction and the space geometric relation among all visual angles, respectively utilizing single-frame and multi-frame super-resolution technologies to perform up-sampling, and fusing and outputting the light field image after up-sampling; step 2, converting the up-sampled light field image into a two-dimensional polar line plane graph set, optimizing the space structure of the two-dimensional polar line plane graph set by using a three-layer full convolution neural network, and outputting the light field image after the space structure is optimized; and 3, performing semantic and texture optimization and correction on the light field image after the spatial structure optimization, and outputting the reconstructed super-resolution light field image. The method overcomes the defect that the reconstructed image after the super-resolution of the light field does not follow the multi-view imaging rule in space geometry, ensures the original semantic and textural characteristics of each sub-image, and has robustness and good reconstruction effect.

Description

Light field super-resolution reconstruction method combining multi-view angle and semantic texture features

Technical Field

The invention belongs to the technical field of computer vision and light field imaging, and particularly relates to a light field super-resolution reconstruction method combining multi-view angles and semantic texture features.

Background

Compared with the traditional imaging, the light field imaging can acquire light information in an expanded mode by capturing the spatial and angular distribution of light, and required images are calculated through data processing means such as transformation, integration and the like. Therefore, the light field image has four-dimensional information including two dimensions of space and angle, and the functions of refocusing after shooting, aperture control after shooting, depth estimation based on single shooting and the like are realized. Early light field cameras consisted primarily of a camera array, mask-based light field cameras, requiring a relatively expensive equipment foundation. In recent years, light field cameras based on microlens arrays (MLAs) have provided an economical and efficient way to acquire a light field. The micro-lens array is inserted between the main lens and the sensor to collect the light field, and each micro-lens receives the light transmitted by the main lens and then acquires the light field through the recording of the sensor. However, due to the design of sharing a single sensor to capture spatial and angular information, there is a trade-off between spatial and angular resolution in a light field camera based on a microlens array, which limits the improvement of spatial and angular resolution.

At present, scholars at home and abroad mainly adopt the existing thought based on single-frame image super-resolution to improve the spatial resolution of the light field image, and the super-resolution of the image is realized by using the traditional sparse representation method or the recent mainstream method based on the convolutional neural network and combining the characteristics of the light field image to carry out special transformation. However, many light field super-resolution methods do not take the view angle geometric relationship between sub-images into account well, that is, after super-resolution reconstruction, the original geometric relationship between sub-images is not maintained well. Meanwhile, some existing methods for adding structure optimization to the light field image cannot well give consideration to the continuity of semantics and textures before and after single sub-image reconstruction. These existing problems are all to be solved.

Disclosure of Invention

The invention aims to: the method solves the problems that the existing method for reconstructing the light field super-resolution image does not well consider the view angle geometric relationship among sub-images, so that the original geometric relationship among the sub-images of the image after the light field reconstruction is not well maintained, and the existing methods for adding structure optimization to the light field image cannot well consider the continuity of semantics and textures before and after the reconstruction of a single sub-image and cannot achieve high-quality imaging, and provides the light field super-resolution reconstruction method combining multiple view angles and semantic texture features.

The technical scheme adopted by the invention is as follows:

the light field super-resolution reconstruction method combining the multi-view angle and the semantic texture features comprises the following steps:

step 1, inputting a light field image with low resolution, giving consideration to the image quality after upsampling reconstruction and the space geometric relation among all visual angles, respectively utilizing single-frame and multi-frame super-resolution technologies to upsample the input light field image to reconstruct an image with an expected magnification, fusing the images after upsampling by utilizing the single-frame and multi-frame super-resolution technologies, and outputting the upsampled light field image of a multi-visual angle sub-image set;

2, converting the up-sampled light field image into a light field image represented by a two-dimensional polar line plane graph set, training the light field image by utilizing a three-layer full convolution neural network, further optimizing and correcting a spatial structure of the light field image represented by the two-dimensional polar line plane graph set, converting the light field image into a set of multi-view subgraphs, and outputting the light field image with the optimized spatial structure;

and 3, optimizing and correcting the semantics and the texture of the light field image after the spatial structure is optimized, and outputting the light field image after the optimization and the correction, namely the super-resolution light field image after the reconstruction is completed.

Further, the step 1 comprises the following steps:

step 1.1, carrying out image alignment registration on an input light field image;

1.2, respectively performing upsampling on the aligned and registered light field image by using a single-frame super-resolution technology and performing upsampling by using a multi-frame super-resolution technology to reconstruct an image with an expected magnification, wherein in the multi-frame super-resolution technology, a pseudo video sequence generation method aiming at a light field subgraph set is applied to meet the input condition of multi-frame super-resolution;

and step 1.3, fusing the images reconstructed by the two up-sampling methods in the step 1.2, and outputting the up-sampled light field image.

Further, in step 1.3, the image up-sampled by the single-frame super-resolution technology and the image up-sampled by the multi-frame super-resolution technology are fused by using an adaptive weighted fusion method.

Further, the image fusion adopts the following formula:

wherein the content of the first and second substances,

representing an image sampled by a single frame super resolution technique,

representation of multi-frame super-resolutionSampled images, ω₁And ω₂Is a sum of

And

matrices of the same size, and having ω₁[i,j]+ω₂[i,j]1 is ═ 1; wherein ω is₁And ω₂Size and input of

In this case, the indexes of PSNR and SSIM are used as weighting criteria to adaptively construct a dynamic ω₁And ω₂。

Further, in the step 2, a three-layer full convolution neural network is adopted to train the light field image represented by the two-dimensional polar line plane graph set, and further optimization and correction of the spatial structure are performed, wherein the convolution kernel of the ith convolution layer of the three-layer full convolution neural network is_iDimension f_i×f_iThe number is n_iA ReLU layer is connected behind each convolution layer as an activation function, and the specific process is as follows:

a training stage: outputting a low-resolution light field image subjected to bicubic interpolation as a label, taking the up-sampled light field image obtained in the step 1 as input, training a three-layer full convolution neural network model by taking a mean square error as a loss function, and extracting more accurate spatial structure characteristics of the low-resolution light field image by using the network;

and (3) spatial structure optimization and correction: and inputting each two-dimensional limit plan in the light field image represented by the two-dimensional polar plan set into the trained three-layer full convolution neural network, optimizing the spatial structure of the light field image according to the more accurate spatial structure characteristics of the extracted low-resolution light field image, and outputting the light field image with the optimized spatial structure.

Further, in the step 3, a three-layer full convolution neural network is adopted to perform semantic and texture optimization and correction on the light field image with the optimized spatial structure.

Further, the step 3 comprises the following steps:

step 3.1, inputting each sub-image of the light field image with the optimized spatial structure into a three-layer full convolution neural network, wherein the convolution kernel of the ith convolution layer of the neural network_iDimension f_i×f_iThe number is n_iA ReLU layer is connected behind each convolution layer as an activation function;

and 3.2, outputting the light field image subjected to semantic texture optimization correction, and restoring the one-dimensional angle coordinate into a two-dimensional coordinate to obtain the reconstructed super-resolution light field image.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

1. in the invention, firstly, the method utilizes the technology of combining single-frame and multi-frame super-resolution, samples each light field image sub-graph on the basis of considering the image quality after the up-sampling reconstruction and the space geometric relation among all visual angles, reconstructs the image with the expected magnification, then combines the spatial characteristic representation means such as a two-dimensional polar line plan and the like, optimizes the corresponding relation of the pixel level of each sub-graph by utilizing the constraint relation of multi-visual angle geometry and the inherent rule of light field imaging, corrects the whole light field image, simultaneously considers the continuity of semantics and texture before and after the reconstruction of a single sub-graph, carries out the semantics and texture optimization operation on each sub-graph after the optimization and correction, finally outputs the light field super-resolution image, can combine the visual angle relation among all sub-graphs to a certain extent to reconstruct the light field image, and improves the defect that the reconstructed image after the light field super-resolution does not conform to the multi-visual angle imaging rule in space geometry, the original semantic and textural features of each sub-image are also ensured, and the method has robustness and good reconstruction effect;

2. in the invention, a self-adaptive weighting fusion method is designed, and the design of adaptively fusing and generating the up-sampling image also optimizes the performance index of the whole super-resolution method;

3. in the invention, the three-layer full convolution neural network is adopted to carry out space structure optimization on the up-sampled light field image, and the three-layer full convolution neural network is adopted to carry out semantic texture optimization operation on each sub-image of the light field image after space structure optimization, so that the problems of double images, fuzzy boundaries with different depths of field and the like which may appear in a single sub-image can be well eliminated, semantic and texture characteristics of the input low-resolution sub-image are kept in each sub-image with a single visual angle, and the image reconstruction effect is better;

4. the invention creatively applies the pseudo video sequence generation method aiming at the light field sub-graph set to the super-resolution technology, meets the multi-frame super-resolution input condition, realizes the up-sampling reconstruction of the light field image on the basis of considering the space geometric relation among all the view angles, and helps to maintain the original geometric relation among all the sub-graphs of the image after the light field reconstruction.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is an overall flow chart of the method of the present invention;

FIG. 2 is a process flow diagram of step 1 of the process of the present invention;

FIG. 3 is a process flow diagram of step 2 of the method of the present invention;

FIG. 4 is a method flow diagram of step 3 of the method of the present invention;

FIG. 5 is a block flow diagram of the method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

A light field super-resolution reconstruction method combining multi-view and semantic texture features is shown in a flow chart of the method in figure 1, and the method comprises the following steps:

step 1, inputting a light field image with low resolution, respectively utilizing a single-frame and multi-frame super-resolution technology to perform up-sampling on the input light field image, considering the image quality after up-sampling reconstruction and the space geometric relation among all visual angles, reconstructing an image with an expected magnification factor, and fusing the two up-sampled images to output an up-sampled light field image of a multi-visual angle sub-image set;

The light field image can be described in a four-dimensional structure: l (x, y, u, v), where x, y e (1,2, …, N) represents the spatial resolution coordinates, i.e. the pixel coordinates in a single sub-graph, and u, v e (1,2, …, M) represents the angular resolution, i.e. the size of the horizontal and vertical dimensions of the array of light field views. For convenience, the two dimensions are considered to be equal in size. Thus M²I.e. the number of views, i.e. sub-graphs.

Existing input low-resolution light field image L^l(x, y, u, v), viewed in units of subgraphs, the subgraph of horizontal position i, vertical position j can be represented as

Converting the angle coordinate (u, v) into a linear coordinate k epsilon (1,2, …, M) according to a method of k ═ ((v-1) M + u)²) Then there is

Thus what is input is a set of subgraphs

Further, the step 1 comprises the following steps:

and 1.1, carrying out image alignment registration on the input light field image. Matching the images under different shooting conditions, modeling and analyzing the deflection relation between other images and a reference image to obtain the external parameters of the camera, and further correcting each sub-image to obtain I^l_Re。

And step 1.2, respectively performing upsampling on the aligned and registered light field image by using a single-frame super-resolution technology and performing upsampling by using a multi-frame super-resolution technology to reconstruct an image with an expected magnification.

Performing up-sampling by using a single-frame super-resolution technology: aligning and registering the light field image I^l_ReEach sub-picture of

As input, each picture is respectively subjected to independent up-sampling to obtain

Output is given as I^SI_Up。

Carrying out up-sampling by utilizing a multi-frame super-resolution technology: the multi-frame super-resolution technology is characterized in that a plurality of adjacent frame images with space-time continuity are input, and a single picture subjected to sampling is output. The method for generating the pseudo video sequence aiming at the light field sub-image set is applied to meet the input condition of multi-frame super-resolution, and the up-sampling reconstruction of the light field image on the basis of considering the space geometric relation among all the visual angles is realized. Firstly, a pseudo video sequence generation module is used for generating a set I of subgraphs with spatial continuity^l_ReIn the method, based on the parallax size, a graph with the highest similarity with sub-images of other visual angles is found as a key frame, the difference value of each pixel point in an average frame is used for sorting, and the sub-image sets are rearranged and combined into a series of pseudo video sequences with parallax continuity

As input to the multi-frame super-resolution module.

In the multi-frame super-resolution technology, aiming at an input pseudo video sequence, motion compensation is carried out on the obtained visual angle transfer, and extra details brought by sub-pixel translation are combined to carry out the motion compensation on each sub-image

Is obtained by upsampling

Then the sequence is output by re-labeling

To obtain I^V_Up。

And step 1.3, fusing the images obtained by the two up-sampling methods in the step 1.2, and outputting the up-sampled light field image.

And fusing the image up-sampled by the single-frame super-resolution technology and the image up-sampled by the multi-frame super-resolution technology by using an adaptive weighted fusion method.

Further, the image fusion adopts the following formula:

wherein, the first and the second end of the pipe are connected with each other,

representing an image sampled by a single frame super resolution technique,

representing images, omega, sampled by a multi-frame super-resolution technique₁And ω₂Is a sum of

And

In this case, the indexes of PSNR and SSIM are used as weighting criteria to adaptively construct a dynamic ω₁And omega₂。

Finally, the light field image output by up-sampling is

Wherein

Alpha is an upsampling multiple.

The method flow chart of step 1 is shown in fig. 2.

The step utilizes the super-resolution technology of single frame and multiframe to realize the up-sampling of the input light field image and improve the spatial resolution. Firstly, image alignment calibration is needed, and then the two channels are divided into two channels, wherein one channel adopts a single-frame super-resolution technology, and the other channel adopts a multi-frame super-resolution technology. In the first channel, each sub-image in the light field image is reconstructed by a single-frame super-resolution method. In the second channel, in order to utilize the view angle geometric relationship between sub-images, firstly, the sub-image sets are rearranged and combined into a series of image sequences with parallax continuity through a pseudo video sequence generation module, and the super-resolution of all sub-images, namely light field images, is realized through a multi-frame super-resolution method by carrying out motion compensation on known view angle motion and combining the detail advantages brought by sub-pixel translation. Meanwhile, the output of the two channels is subjected to self-adaptive weighting fusion to obtain a final up-sampled light field image.

Further, the step 2 specifically comprises:

step 2.1, outputting the light field image I after the up-sampling in the step 1^UpRepresentation I converted into a set of two-dimensional Epipolar Plans (EPI)^Up_trans。

Step 2.2, training the light field image represented by the two-dimensional polar line plane graph set by adopting a three-layer full convolution neural network, further optimizing and correcting the spatial structure, and outputting an optimized disparity map and a light field image I represented by the two-dimensional polar line plane graph (EPI) set^{Refine1_trans}＝f₁(I^Up_trans)。

Wherein the i-th convolutional layer of the three-layer full convolutional neural networkCore kernel_iDimension f_i×f_iThe number is n_iAfter each convolutional layer, a ReLU layer is connected as an activation function, and the specific process of step 2.2 is as follows:

and (3) optimizing and correcting the spatial structure: and inputting each two-dimensional limit plane graph in the light field image represented by the two-dimensional epipolar plane graph set into the trained three-layer full convolution neural network, optimizing the spatial structure of the light field image according to the more accurate spatial structure characteristics of the extracted low-resolution light field image, and outputting the light field image after the spatial structure is optimized.

Step 2.3, the light field image obtained in the step 2.2 after the space structure optimization is converted into an expression structure of a subgraph set again, and the expression structure is output

The method flow chart of step 2 is shown in fig. 3. As described above, one way of representing a light field image is to divide into projections of multiple views onto the same scene, i.e. a set of multi-view subgraphs. When the super-resolution is realized under the representation method, the consistency and continuity of the spatial structure among the sub-images are required, namely, the light field image after the super-resolution also has the spatial structure consistent with the input low-resolution sub-image set. And step 2, further constraining the spatial structure of the up-sampled light field image. Since the light field image represented by the sub-image set can only display a single image of each view angle individually, the spatial characteristics between the view angles cannot be well described. The characteristic connection of the spatial structure between the sub-images will therefore first be described at the image level by means of a two-dimensional Epipolar Plan (EPI). And (2) further optimizing and correcting the up-sampling light field image reconstructed in the step (1) according to the extracted spatial structure characteristics of the low-resolution light field image as a standard, and outputting the light field image after spatial structure optimization correction after converting into a sub-image set representation form again.

Further, the step 3 comprises the following steps:

step 3.1, optimizing the spatial structure of the light field image I^Refine1Each sub-figure of

Inputting the convolution kernel of the ith convolution layer into a three-layer full convolution neural network_iDimension f_i×f_iThe number is n_iA ReLU layer is connected behind each convolution layer as an activation function;

step 3.2, outputting the light field image after semantic texture optimization and correction, wherein the light field image output through the neural network can be represented as I^Refine2＝f₂(I^Refine1) Restoring the one-dimensional angle coordinate into a two-dimensional coordinate to obtain the reconstructed super-resolution light field image L^SR(x，y，u，v)＝F_SR(L^l(x，y，u，v))。

The method flow chart of step 3 is shown in fig. 4. As a basic requirement of the super-resolution method, the reconstructed image has not only spatial geometrical continuity between views, but also semantic and texture features of the input low-resolution sub-image in the sub-image of each single view. According to principle analysis, the light field image I output by the spatial structure optimization module^Refine1There is consistency in the spatial structure, but due to fine tuning in the spatial structure, there are still errors and noise in the semantic and texture information in each single sub-graph, and mismatches with the features of the low resolution sub-graph. For example, I^Refine1Ghosting, blurring of boundaries at different depths, and the like may occur in a single sub-graph. Step 3 for each subgraphPerforming semantic texture optimization operation by using a convolutional neural network, and outputting an optimized and corrected light field image I^Refine2。

The method steps of the invention can be applied as a whole as a system comprising the following 3 modules: the method comprises an up-sampling reconstruction module, a spatial structure optimization module, a semantic texture optimization module and 3 modules, wherein the up-sampling reconstruction module, the spatial structure optimization module, the semantic texture optimization module and the 3 modules are respectively operated corresponding to three steps of the method. The schematic block flow diagram is shown in fig. 5.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A light field super-resolution reconstruction method combining multi-view and semantic texture features is characterized by comprising the following steps: the method comprises the following steps:

step 2, converting the up-sampled light field image into a light field image represented by a two-dimensional polar line plane graph set, further optimizing the spatial structure of the light field image represented by the two-dimensional polar line plane graph set by using a three-layer full convolution neural network, converting the light field image into a set of multi-view subgraphs, and outputting the light field image with the optimized spatial structure;

2. The light field super-resolution reconstruction method combining multi-view and semantic texture features according to claim 1, characterized in that: the step 1 comprises the following steps:

3. The light field super-resolution reconstruction method combining multi-view and semantic texture features according to claim 1 or 2, characterized in that: in the step 1, the image sampled by the single-frame super-resolution technology and the image sampled by the multi-frame super-resolution technology are fused by using a self-adaptive weighted fusion method.

4. The light field super resolution reconstruction method combining multi-view and semantic texture features according to any one of claims 1-3, characterized in that: the image fusion adopts the following formula:

wherein the content of the first and second substances,

representing an image sampled by a single frame super resolution technique,

And

5. The light field super-resolution reconstruction method combining multi-view and semantic texture features according to claim 1, characterized in that: in the step 2, a three-layer full convolution neural network is adopted to train the light field image represented by the two-dimensional polar line plane graph set, and further optimization and correction of the spatial structure are carried out, wherein the convolution kernel of the ith convolution layer of the three-layer full convolution neural network is_iDimension f_i×f_iThe number is n_iA ReLU layer is connected behind each convolution layer as an activation function, and the specific process is as follows:

and (3) spatial structure optimization and correction: and inputting each two-dimensional limit plane graph in the light field image represented by the two-dimensional epipolar plane graph set into the trained three-layer full convolution neural network, optimizing the spatial structure of the light field image according to the more accurate spatial structure characteristics of the extracted low-resolution light field image, and outputting the light field image after the spatial structure is optimized.

6. The light field super-resolution reconstruction method combining multi-view and semantic texture features according to claim 1, characterized in that: and 3, performing semantic and texture optimization and correction on the light field image with the optimized spatial structure by adopting a three-layer full convolution neural network.

7. The light field super resolution reconstruction method combining multi-view and semantic texture features according to claim 1 or 4, characterized in that: the step 3 comprises the following steps: