CN112365400A

CN112365400A - Rapid super-resolution reconstruction method for light field angle

Info

Publication number: CN112365400A
Application number: CN202011164974.5A
Authority: CN
Inventors: 王兴政; 昝永强; 游森林; 邓元龙
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2021-02-12
Anticipated expiration: 2040-10-27
Also published as: CN112365400B

Abstract

The invention discloses a fast light field angle super-resolution reconstruction method, which comprises the steps of determining two input view angle images according to the position of a preset view angle; inputting the two input visual angle images into a trained image reconstruction network, and processing the two input visual angle images through the image reconstruction network to obtain a new visual angle image; there is only horizontal direction parallax between the view angle of the new view angle image and the view angles of the two input view angle images. Because each new visual angle image only adopts two visual angle images as input and the two input visual angle images and the new visual angle image only have horizontal parallax, the reconstruction speed of the image is improved while the quality of the new visual angle image is ensured, and the calculation resource is saved. Meanwhile, the invention can reconstruct a new view angle image at any position.

Description

Rapid super-resolution reconstruction method for light field angle

Technical Field

The invention relates to the field of light field imaging, in particular to a rapid light field angle super-resolution reconstruction method.

Background

A light field camera can record a plurality of view images at a time, and there is a trade-off between the spatial resolution and the angular resolution of the light field images, so that it is necessary to study how to improve the angular resolution of the light field while ensuring high spatial resolution.

The super-resolution reconstruction of the light field angle is also called the new visual angle synthesis of the light field. The traditional light field angle super-resolution reconstruction method can be divided into two methods, wherein the first method adopts a depth estimation algorithm to obtain an accurate depth map firstly and then maps the known view angle image to a new view angle position to reconstruct a new view angle; the second approach is to consider the light-field angular super-resolution reconstruction as a sampling and reconstruction of the plenoptic function, and to consider each pixel of the known view-angle image as a sampling of the high-dimensional plenoptic function. However, both of the above methods are susceptible to non-lambertian surfaces. In recent years, deep learning has made great progress in image noise reduction, super-resolution, deblurring, and the like, and the application of Convolutional Neural Network (CNN) to the field of light field angle super-resolution reconstruction has also begun to appear. Yoon and the like apply the convolutional neural network to light field angle super-resolution reconstruction for the first time, the method reconstructs a new view angle image by utilizing information between adjacent multi-view angle images, but only reconstructs a middle view angle image of the adjacent view angles, and cannot reconstruct the new view angle image at any position. Kalatari et al propose a learning-based light field new view reconstruction method, which improves the quality of reconstructed views, but because 4 input view images are required for synthesizing each new view, the synthesized view speed is slow.

Accordingly, the prior art is yet to be improved and developed.

Disclosure of Invention

In view of the above-mentioned deficiencies of the prior art, an object of the present invention is to provide a fast super-resolution reconstruction method for light field angle, which is capable of improving the reconstruction speed and saving the computation resource while ensuring the quality of the reconstructed new view angle image. According to the geometric principle of view interpolation, the intermediate view of the two views can be obtained according to the parallax information. Therefore, the input view angles are selected, so that only two input view angle images are needed for synthesizing each new view angle image, the calculation amount is reduced, and the synthesis speed of the new view angle is improved.

The technical scheme of the invention is as follows:

a fast light field angle super-resolution reconstruction method comprises the following steps:

acquiring two input visual angle images according to the position of a preset visual angle image; the preset visual angle and the input visual angles corresponding to the two input visual angle images only have horizontal parallax;

inputting the two input visual angle images into a trained image reconstruction network, and processing the two input visual angle images through the image reconstruction network to obtain a new visual angle image;

and the view angle corresponding to the new view angle image is a preset view angle.

The fast light field angle super-resolution reconstruction method is characterized in that the position of the preset visual angle is between the two input visual angles or the positions of the two input visual angles.

The fast light field angle super-resolution reconstruction method includes that the trained image reconstruction network includes a trained depth estimation network and a trained color estimation network, and the two input view images are processed by the image reconstruction network to obtain a new view image, including:

inputting the two input visual angle images into a trained depth estimation network, and obtaining the depth information of a new visual angle image through the trained depth estimation network;

and inputting the depth information of the new visual angle image and the two input visual angle images into the trained color estimation network, and obtaining the new visual angle image through the trained color estimation network.

The fast light field angle super-resolution reconstruction method, wherein the depth estimation network includes a trained first convolution neural network, and the obtaining of the depth information of the new view angle image through the depth estimation network includes:

presetting a plurality of depth levels according to the maximum parallax range of the preset visual angle and the input visual angle;

according to the depth information of the depth levels, feature extraction is carried out on the two input visual angle images to obtain a depth feature set;

and inputting the depth feature set into a first convolution neural network, and obtaining the depth information of the new visual angle image through the first convolution neural network.

The fast light field angle super-resolution reconstruction method, wherein the extracting features of the two input view images according to the depth information of the plurality of depth levels to obtain a depth feature set, includes:

obtaining two first mapping images corresponding to each depth level according to the depth information of each depth level and the two input view images;

obtaining a mean value and a standard deviation corresponding to each depth grade according to the two first mapping images corresponding to each depth grade;

and forming a set by the mean values and the standard deviations corresponding to all the depth levels to obtain a depth feature set.

The fast light field angle super-resolution reconstruction method, wherein the trained color estimation network includes a second convolutional neural network, and obtaining a new view angle image through the trained color estimation network includes:

obtaining a color feature set according to the depth information of the new view angle image, the position of the preset view angle and the two input view angle images;

and inputting the color feature set into the trained second convolutional neural network, and obtaining a new visual angle image through the trained second convolutional neural network.

The fast light field angle super-resolution reconstruction method, wherein obtaining a color feature set according to the depth information of the new view image, the position of the preset view and the two input view images, comprises:

obtaining two second mapping images according to the depth information of the new view angle image, the position of the preset view angle and the two input view angle images;

and combining the two second mapping images, the depth information of the new view angle image and the position of the preset view angle into a set to obtain a color feature set.

The fast light field angle super-resolution reconstruction method obtains two second mapping images according to the depth information of the new view image, the position of the preset view and the two input view images, and comprises the following steps:

obtaining the parallax between the two input view angle images and the new view angle image according to the depth information of the new view angle image and the position of the preset view angle;

and obtaining two second mapping images according to the parallax and the two input visual angle images.

The fast light field angle super-resolution reconstruction method comprises the following steps of:

preprocessing image samples in a light field data set to obtain a plurality of groups of training image groups, wherein each group of training image groups comprises two first images and second images corresponding to the two first images, the visual angles of the first images and the second images only have parallax in the horizontal direction, and the visual angle position of the second images is between the visual angle positions of the two first images;

inputting the two first images into a preset image reconstruction network to obtain reconstructed images corresponding to the two first images; and the preset image reconstruction network corrects network parameters of the preset image reconstruction network according to the reconstructed images corresponding to the two first images and the second images corresponding to the two first images, and continues to execute the step of obtaining the reconstructed images corresponding to the two first images according to the two first images until the training condition of the preset image reconstruction network meets a preset condition so as to obtain the trained image reconstruction network.

The fast light field angle super-resolution reconstruction method includes the steps of correcting network parameters of a preset image reconstruction network according to reconstructed images corresponding to the two first images and second images corresponding to the two first images, and continuing to execute the step of obtaining reconstructed images corresponding to the two first images according to the two first images until a training condition of the preset image reconstruction network meets a preset condition, so as to obtain a trained image reconstruction network, and includes the steps of:

obtaining error function values according to the reconstructed images corresponding to the two first images and the second images corresponding to the two first images;

and training the preset image reconstruction network according to the error function value, and continuing to execute the step of obtaining reconstructed images corresponding to the two first images according to the two first images until the training condition of the preset image reconstruction network meets a preset condition so as to obtain the trained image reconstruction network.

Has the advantages that: the invention provides a fast light field angle super-resolution reconstruction method, which comprises the steps of determining two input view angle images according to the position of a preset view angle; inputting the two input visual angle images into a trained image reconstruction network, and processing the two input visual angle images through the image reconstruction network to obtain a new visual angle image; there is only horizontal direction parallax between the view angle of the new view angle image and the view angles of the two input view angle images. Because only two view images are used as input and only horizontal parallax exists between the two input view images and the new view image, the reconstruction speed of the image is improved and the operation resource is saved while the quality of the new view image is ensured. Meanwhile, the invention can reconstruct a new view angle image at any position.

Drawings

Fig. 1 is a first flowchart of a fast optical field angle super-resolution reconstruction method according to a preferred embodiment of the present invention.

FIG. 2 is a second flowchart of a fast light field angle super-resolution reconstruction method according to a preferred embodiment of the present invention.

Fig. 3 is a schematic view of parallax.

Fig. 4 is a schematic diagram of a positional relationship between a preset viewing angle and an input viewing angle.

Detailed Description

The invention provides a fast light field angle super-resolution reconstruction method, which is further detailed in the following in order to make the purpose, technical scheme and effect of the invention clearer and more clear. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a fast light field angle super-resolution reconstruction method, as shown in fig. 1, fig. 1 is a first flow diagram of a preferred embodiment of the fast light field angle super-resolution reconstruction method of the invention, the method includes the steps:

s10, determining two input visual angle images according to the position of the preset visual angle;

and S20, inputting the two input view images into a trained image reconstruction network, and processing the two input view images through the image reconstruction network to obtain a new view image.

Specifically, as shown in fig. 3, fig. 3 is a schematic diagram of parallax, where X denotes a point in space, C and C' denote left and right cameras, and parallax denotes an imaging difference of the same point in space on different cameras. For the light-field multi-view images, the view images in the same row only have horizontal parallax, and the view images in the same column only have vertical parallax. According to the geometric principle of view interpolation, the input views corresponding to the preset view and the two input view images only have horizontal parallax, and the parallax information can obtain a middle view image of the two view images, namely a new view image corresponding to the preset view. Therefore, two input view images are selected according to the preset view, so that only two input view images are needed for synthesizing the new view image, the calculation amount of image processing is reduced, and the synthesis speed of the new view is improved.

The view angle corresponding to the new view angle image is a preset view angle, and the view angle corresponding to the input view angle image is an input view angle. As shown in fig. 4, fig. 4 is a schematic diagram of a positional relationship between a preset viewing angle and an input viewing angle, a viewing angle position corresponding to a black image is between viewing angle positions corresponding to white images, both the viewing angle positions corresponding to the black image and the white image can be the positions of the preset viewing angle, and the viewing angle position corresponding to the white image is the position of the input viewing angle. Selecting the position of a preset visual angle, determining two input visual angles in the same horizontal parallax direction with the preset visual angle according to the position of the preset visual angle, and acquiring two input images corresponding to the two input visual angles respectively, wherein the two input visual angles and the preset visual angle only have horizontal parallax, and the position of the preset visual angle is between the two input visual angles or is the position of the two input visual angles. For example, if the 3 rd black image on the 1 st row is selected as the new viewing angle image, the two input viewing angle images are both the 1 st row image, the 1 st or 2 nd image on the 1 st row can be used as one input viewing angle image, and the 4 th, 5 th, 6 th, 7 th or 8 th image on the 1 st row can be used as the other input viewing angle image.

In the embodiment, the trained image reconstruction network is adopted, and the images corresponding to all view angle positions between the two input view angles in the horizontal direction can be reconstructed according to the two input view angle images of the same horizontal direction view angle, so that a new view angle image with higher quality is obtained.

Further, the trained image reconstruction network includes a trained depth estimation network and a trained color estimation network, and the step S20 includes:

s21, inputting the two input perspective images into a trained depth estimation network, and obtaining the depth information of the new perspective image through the trained depth estimation network;

and S22, inputting the depth information of the new visual angle image and the two input visual angle images into the trained color estimation network, and obtaining the new visual angle image through the trained color estimation network.

Specifically, the trained depth estimation network is used for extracting depth features at a preset view angle according to two input images, and obtaining depth information at the preset view angle through the depth features. The trained color estimation network is used for obtaining color features at a preset view angle according to depth information at the preset view angle, obtaining a new view angle image according to the color features, and the network structures of the trained depth estimation network and the trained color estimation network are shown in fig. 2.

Further, the trained depth estimation network includes a first convolutional neural network, and step S21 includes:

s211, presetting a plurality of depth levels according to the maximum parallax range of the preset visual angle and the input visual angle;

s212, according to the depth information of the depth levels, performing feature extraction on the two input visual angle images to obtain a depth feature set;

and S213, inputting the depth feature set into a first convolution neural network, and obtaining the depth information of the new visual angle image through the first convolution neural network.

Specifically, to map the input image to the preset view angle position according to the depth information of the new view angle image, so as to obtain the new view angle image, since the depth information × the base line (the distance between the new view angle and the input view angle) is equal to the parallax, the depth information range of the new view angle image can be obtained according to the maximum parallax range between the preset view angle and the input view angle. The maximum parallax refers to the maximum parallax between the preset viewing angle and the parallax between two input viewing angles, for example, as shown in fig. 4, the viewing angle corresponding to the 3 rd black image in the 1 st row is selected as the preset viewing angle, the viewing angle corresponding to the two white images in the 1 st row is selected as the two input viewing angles, the parallax between the white image on the right side and the black image is the maximum, and if the parallax is 21, the maximum parallax range is [ -21,21]. Due to the absence of the true value of the depth information, the depth information estimation needs to be performed within the depth information range. Presetting a plurality of depth levels in the depth information range of the new visual angle image, wherein each depth level corresponds to one depth information d₁,d₂,···,d_lWherein l is the number of depth levels. After the depth level is preset, two input visual angle images are respectively inputAccording to the depth information mapping of different depth levels, two first mapping images corresponding to each depth level are obtained, and the first mapping image formula is as follows:

wherein the content of the first and second substances,

a first mapped image is represented as a first image,

representing the input view image, s represents the position in the pixel coordinate system, p_iIndicating the position of the input view angle, q indicating the position of the preset view angle, d_jAnd the depth information of each depth level is represented, i belongs to {1,2}, j belongs to {1,2 · · l }, and l is the number of the depth levels.

To extract features, the mean and standard deviation of the two first mapping images for each depth level (i.e., the mean and standard deviation for each depth level) are calculated, and the mean for each depth level is calculated

And standard deviation of

Respectively as follows:

wherein the content of the first and second substances,

and

respectively representing two first mapping images. Calculating the mean values and standard deviations corresponding to all depth levels, and combining all the mean values and standard deviations into a depth feature set K ═ M^d1，V^d1，M^d2，V^d3，···，M^dl，V^dlBecause there are l depth levels, the depth feature set has 2l feature vectors. Because two input visual angle images are adopted, and only horizontal parallax exists between the input visual angles corresponding to the two input visual angle images and the preset visual angle corresponding to the new visual angle image, the time required for extracting the depth features is less, and the image reconstruction time is prolonged.

After the depth feature set is obtained, the depth feature set is input into a first convolution neural network, and depth information D at a preset visual angle is estimated by using the depth feature set_qThe estimation formula is:

D_q＝g_d(K)

wherein D is_qDepth information, g, representing a new view image_dAnd representing the mapping relation of the depth feature set and the depth information of the new view image.

Further, the color estimation network includes a second convolutional neural network, and the step S22 includes:

s221, obtaining a color feature set according to the depth information of the new view angle image, the position of the preset view angle and the two input view angle images;

s222, inputting the color feature set into the second convolutional neural network, and obtaining a new visual angle image through the trained second convolutional neural network.

Specifically, after the depth information of the new view image is obtained, two input view images are respectively mapped to the position of a preset view according to the depth information of the new view image, and two second mapping images corresponding to the depth information are obtained, where the mapping formula is as follows:

wherein D is_q(s) depth information of the new view angle image, i ∈ {1,2}, (p)_i-q)D_qIndicating the disparity between the input view angle corresponding to the input image and the preset view angle corresponding to the new view angle image. From two second mapping images

And

depth information D of new view angle image_qAnd the position q of the preset visual angle forms a color characteristic set

The depth information of each pixel of the new view image corresponds to a corresponding pixel of each input image, so the color information of the new view image can be obtained from the input view image estimation. Inputting the color feature set H into a second convolution neural network to obtain a final new visual angle image, wherein the formula is as follows:

L_q＝g_c(H)

wherein L is_qRepresenting a new view angle image, g_cRepresenting a set of color features H and a new perspective image L_qThe mapping relationship between them.

In one implementation, the trained image reconstruction network is trained by the following steps:

m1, preprocessing image samples in the optical field data set to obtain a plurality of training image groups, where each training image group includes two first images and second images corresponding to the two first images, the viewing angles of the first images and the second images only have horizontal parallax or vertical parallax, and the viewing angle position of the second image is between the viewing angle positions of the two first images.

Specifically, the network training is performed using a Stanford light field dataset, the data in the Stanford light field dataset being light field images, the light field images being acquired by a LytroIllum light field camera, the actual angular resolution of which is 14 × 14, that is, the obtained light field images are multi-view images composed of 14 × 14 view angle images. Since the edge view images of the multi-view image are affected by noise, vignetting effect, and the like, the central 8 × 8 view images are employed as the light field data set. The training set contained 100 light field images and the test set contained 30 light field images. Since the training speed of the complete image is very slow, the image blocks with the size of 60 × 60 are extracted from the complete image, the step size is 16 pixels, and therefore more than 100000 image blocks are obtained to train the network. Before network training, each image block needs to be preprocessed, that is, each image block is processed into a multi-view image, and in the multi-view image, an image with the same horizontal direction view angle is used as a training image group. In each training image group, two pictures corresponding to the leftmost and rightmost visual angles are used as first images, and other images are used as second images.

M2, inputting the two first images into a preset image reconstruction network to obtain reconstructed images corresponding to the two first images; and the preset network model corrects the model parameters of the preset network model according to the reconstructed images corresponding to the two first images and the second images corresponding to the two first images, and continues to execute the step of obtaining the reconstructed images corresponding to the two first images according to the two first images until the training condition of the preset network model meets the preset condition so as to obtain the trained image reconstruction network.

Specifically, the preset image reconstruction network comprises a preset depth estimation network and a preset color estimation network. Inputting the two first images into a preset image reconstruction network to obtain reconstructed images corresponding to the two first images, and reconstructing the reconstructed images corresponding to the two first images

Second image L corresponding to two first images_q,kThe error between the two is used as an error function, a preset depth estimation network and a preset color estimation network are trained simultaneously according to the error function value of the error function, and the error function is as follows:

where k represents 3 channels of RGB. In order to use the gradient descent method to minimize the error function, it is necessary to calculate the error E and the partial derivatives of the two network weights, i.e.

And

w_dand w_cRespectively representing the weights of the preset depth estimation network and the preset color estimation network.

Since the output of the pre-set color estimation network is a reconstructed image, therefore,

directly using standard error back propagation calculation, and

it cannot be calculated directly, and it is divided into 3 parts using the chain-derivative rule as follows:

wherein the content of the first and second substances,

is the partial derivative between the error E and the reconstructed image and can be directly calculated;

is depth information (output of a preset depth estimation network) D_qAnd weight w of the pre-set depth estimation network_dPartial derivatives between, can be calculated directly;

to representPartial derivative between reconstructed image and depth information, its set of color features

In that it is based on depth information D_qA set of color features H is obtained, which in turn is used to obtain a reconstructed image

Thus, it is possible to obtain:

wherein the content of the first and second substances,

is a reconstructed image

And depth information D_qThe partial derivatives between can be directly calculated;

is a reconstructed image

And a color feature set H_tThe partial derivative in the t-th channel,

is a set of color features H_tAnd depth information D_qWith the partial derivatives at the t-th channel.

And

the partial derivatives corresponding to each channel need to be calculated separately, and the first 3N channels are mapping images mapped by the preset color estimation network. The 3N +1 channel refers to depth information, partial derivativeIs 1, the last two channels refer to the location of the reconstructed image, and is independent of depth, and is therefore 0. And performing error back propagation through the steps, then updating the weight, and repeatedly executing the steps to perform simultaneous training on the two networks until the training condition of the preset reconstruction network meets the preset condition, namely, the error function value is minimized, so as to obtain the trained image reconstruction network.

In summary, the present invention provides a fast super-resolution reconstruction method for light field angles, which determines two input view images according to the position of a preset view; inputting the two input visual angle images into a trained image reconstruction network, and processing the two input visual angle images through the image reconstruction network to obtain a new visual angle image; there is only horizontal direction parallax between the view angle of the new view angle image and the view angles of the two input view angle images. Because only two view images are used as input and only horizontal parallax exists between the two input view images and the new view image, the reconstruction speed of the new view image is improved, and the calculation resources are saved. Meanwhile, the invention can reconstruct a new view angle image at any position.

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. A fast light field angle super-resolution reconstruction method is characterized by comprising the following steps:

acquiring two input visual angle images according to the position of a preset visual angle; the preset visual angle and the input visual angles corresponding to the two input visual angle images only have horizontal parallax;

inputting the two input visual angle images into a trained image reconstruction network, and processing the two input visual angle images through the image reconstruction network to obtain a new visual angle image; and the view angle corresponding to the new view angle image is a preset view angle.

2. The fast light-field angular super-resolution reconstruction method according to claim 1, wherein the position of the preset view is between or at the position of the two input views.

3. The fast light-field angular super-resolution reconstruction method according to claim 2, wherein the trained image reconstruction network comprises a trained depth estimation network and a trained color estimation network, and the processing of the two input view images by the image reconstruction network to obtain a new view image comprises:

4. The fast light-field angular super-resolution reconstruction method according to claim 3, wherein the trained depth estimation network comprises a first convolutional neural network, and the obtaining of the depth information of the new view image through the depth estimation network comprises:

5. The fast light-field angular super-resolution reconstruction method according to claim 4, wherein said performing feature extraction on the two input view images according to the depth information of the plurality of depth levels to obtain a depth feature set comprises:

6. The fast light field angular super-resolution reconstruction method according to claim 3, wherein the trained color estimation network comprises a second convolutional neural network, and the obtaining of the new view image through the trained color estimation network comprises:

7. The fast light-field angular super-resolution reconstruction method according to claim 6, wherein said deriving a color feature set from the depth information of the new view image, the position of the preset view and the two input view images comprises:

8. The fast light-field angular super-resolution reconstruction method according to claim 7, wherein obtaining two second mapping images according to the depth information of the new view image, the position of the preset view and the two input view images comprises:

9. The fast light-field angular super-resolution reconstruction method according to claim 1, wherein the trained image reconstruction network is trained by the following steps:

10. The fast light field angular super-resolution reconstruction method according to claim 9, wherein the step of modifying the network parameters of the preset image reconstruction network according to the reconstructed images corresponding to the two first images and the second images corresponding to the two first images, and continuing to perform the step of obtaining the reconstructed images corresponding to the two first images according to the two first images until a training condition of the preset image reconstruction network satisfies a preset condition to obtain the trained image reconstruction network comprises: