CN117475088A

CN117475088A - Light field reconstruction model training method based on polar plane attention and related equipment

Info

Publication number: CN117475088A
Application number: CN202311785291.5A
Authority: CN
Inventors: 李宁; 居法银; 朱虎
Original assignee: Jiangsu Youzhong Micro Nano Semiconductor Technology Co ltd; Zhejiang Unisom New Material Technology Co ltd
Current assignee: Jiangsu Youzhong Micro Nano Semiconductor Technology Co ltd; Zhejiang Unisom New Material Technology Co ltd
Priority date: 2023-12-25
Filing date: 2023-12-25
Publication date: 2024-01-30
Anticipated expiration: 2043-12-25
Also published as: CN117475088B

Abstract

The invention relates to a light field reconstruction model training method and related equipment based on polar plane attention, and relates to the technical field of image processing, wherein the method comprises the following steps: acquiring a light field data set and a network reconstruction coefficient, generating polar plane image data based on the light field data set, and grouping the polar plane image data according to a specified angle direction to obtain a plurality of groups of three-dimensional light field data corresponding to the light field data set; calculating an attention map, taking the product of the attention map and the third convolution result as polar plane attention data; taking the attention data as guide map data, and carrying out guide filtering processing based on the three-dimensional light field data and the guide map data to obtain target three-dimensional light field data; constructing training set data by utilizing the three-dimensional light field data and the target three-dimensional light field data, and performing model training by utilizing the training set data to obtain a light field super-resolution reconstruction model; the method can obtain the super-resolution reconstruction model of the light field, and improves the continuity of the light field in the angular dimension.

Description

Light field reconstruction model training method based on polar plane attention and related equipment

Technical Field

The invention relates to the technical field of image processing, in particular to a light field reconstruction model training method based on polar plane attention and related equipment.

Background

The light field parameterizes the position and direction information of the four-dimensional optical radiation field in space, and can contain more abundant image information compared with the traditional imaging mode of only recording two dimensions. In order to collect light field information, researchers developed a series of light field collection systems based on light field cameras, and light field collection was achieved through single imaging of a single camera. The light field camera images a light field of a plurality of angles on the same plane by dividing the aperture of the single camera, and generates a sub-aperture image recording light field. Today, many researchers have proposed various implementations of light field cameras, but regardless of the structure employed, the angular resolution of a light field camera is obtained by sacrificing its spatial resolution. Under the condition of unchanged volume, the increase of the light field angular resolution inevitably leads to the decrease of the aperture of the sub-lens, and the brightness is reduced while the resolution of the sub-aperture image is limited. Increasing the aperture inevitably reduces the angular resolution, reduces the depth of field, and affects the imaging stability.

Disclosure of Invention

The light field reconstruction model training method and the related equipment based on the polar plane attention provided by the embodiment of the invention can at least realize the super-resolution of the light field, and improve the continuity of the light field in the angular dimension.

According to one aspect of the present application, there is provided a light field reconstruction model training method based on polar plane attention, the method comprising: acquiring a light field data set and a network reconstruction coefficient, generating polar plane image data based on the light field data set, and grouping the polar plane image data according to a specified angular direction to obtain a plurality of groups of three-dimensional light field data corresponding to the light field data set; performing a first convolution operation and a second convolution operation on the three-dimensional light field data based on the first convolution check to obtain a first convolution result and a second convolution result, and performing a third convolution operation on the three-dimensional light field data based on the second convolution check to obtain a third convolution result; wherein the second convolution kernel is determined from the first convolution kernel and the network reconstruction coefficients; the first convolution kernel is used for extracting phase characteristics of the three-dimensional light field data; calculating an attention map from the first convolution result and the second convolution result, taking the product of the attention map and the third convolution result as polar plane attention data; wherein the attention is intended to describe correspondence between polar plane image data in the three-dimensional light field data; taking the attention data as guide map data, and carrying out guide filtering processing based on the three-dimensional light field data and the guide map data to obtain target three-dimensional light field data; constructing training set data by utilizing the three-dimensional light field data and the target three-dimensional light field data, and performing model training by utilizing the training set data to obtain a light field super-resolution reconstruction model.

According to an aspect of the present application, there is also provided a light field reconstruction method based on polar plane attention, including: acquiring light field data to be processed; processing the light field data to be processed by utilizing a light field reconstruction model based on polar plane attention to obtain a light field reconstruction result; the light field reconstruction model based on the polar plane attention is obtained according to the method.

According to an aspect of the present application, there is also provided a light field reconstruction model training apparatus based on polar plane attention, including: the data module is used for acquiring a light field data set and a network reconstruction coefficient, generating polar plane image data based on the light field data set, and grouping the polar plane image data according to a specified angular direction to obtain a plurality of groups of three-dimensional light field data corresponding to the light field data set; the convolution module is used for respectively executing a first convolution operation and a second convolution operation on the basis of the first convolution check and the three-dimensional light field data to obtain a first convolution result and a second convolution result, and executing a third convolution operation on the basis of the second convolution check and the three-dimensional light field data to obtain a third convolution result; wherein the second convolution kernel is determined from the first convolution kernel and the network reconstruction coefficients; the first convolution kernel is used for extracting phase characteristics of the three-dimensional light field data; an attention module for calculating an attention map from the first convolution result and the second convolution result, taking the product of the attention map and the third convolution result as polar plane attention data; wherein the attention is intended to describe correspondence between polar plane image data in the three-dimensional light field data; the guiding and filtering module is used for taking the attention data as guiding image data, and performing guiding and filtering processing based on the three-dimensional light field data and the guiding image data to obtain target three-dimensional light field data; the training module is used for constructing training set data by utilizing the three-dimensional light field data and the target three-dimensional light field data, and performing model training by utilizing the training set data to obtain a light field super-resolution reconstruction model.

According to another aspect of the present application, there is also provided an electronic apparatus including: a processor; and a memory storing a program, wherein the program comprises instructions that when executed by the processor cause the processor to perform a method according to the above.

According to another aspect of the present application, there is also provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method steps according to the above.

The embodiment of the invention has the beneficial effects that:

according to the embodiment of the invention, firstly, a light field data set and a network reconstruction coefficient are obtained, polar plane image data are generated based on the light field data set, the polar plane image data are grouped according to a specified angular direction to obtain a plurality of groups of three-dimensional light field data corresponding to the light field data set, a first convolution operation and a second convolution operation are respectively carried out on the three-dimensional light field data based on a first convolution check, so that phase characteristics of the three-dimensional light field data are extracted, a first convolution result and a second convolution result are obtained, further, an attention map is calculated based on the first convolution result and the second convolution result, the attention map is used for describing a corresponding relation between the polar plane image data in the three-dimensional light field data, a third convolution result with higher angular resolution is obtained based on the network reconstruction coefficient and the second convolution check, the product of the attention map and the third convolution result is used as polar plane attention data, the attention map data is used as guide map data, the three-dimensional light field data are subjected to guide filtering processing, further, the three-dimensional light field data and the target three-dimensional light field data with high angular resolution are obtained, the attention map data and the target three-dimensional light field data are utilized, the attention map data are constructed, the attention map data is further, the super-resolution light field model is obtained, and the light field can be continuously reconstructed in the angular resolution, and the light field can be realized, and the super-resolution model is realized, and the light field can be continuously reconstructed.

The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is evident that the drawings in the following description are only some embodiments of the invention, from which other embodiments can be obtained for a person skilled in the art without inventive effort.

FIG. 1 is a flow chart of a training method of a light field reconstruction model based on polar plane attention according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a method for implementing an embodiment of the present invention;

FIG. 3 is a schematic diagram of an angle resolution reconstruction effect according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a reconstruction model structure according to an embodiment of the present invention;

FIG. 5 is a graph showing the quantitative analysis and comparison of PSNR in the examples of the present invention;

FIG. 6 is a comparison chart of SSIM quantitative analysis of an embodiment of the present invention;

fig. 7 is a schematic structural diagram of the electronic device of the present embodiment.

Detailed Description

Embodiments of the present embodiment will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present embodiments are illustrated in the accompanying drawings, it is to be understood that the present embodiments may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the present embodiments. It should be understood that the drawings and the embodiments of the present embodiments are presented for purposes of illustration only and are not intended to limit the scope of the embodiments.

A technique of converting an optical signal collected by a device into a digital image is called an optical imaging technique. Optical imaging technology is the basis for computer vision. With the development of electronic technology, modern optical imaging technology is undergoing tremendous changes. Classical imaging systems simulate the principle of operation of the human eye, projecting a 3D scene onto a 2D sensor plane to obtain a 2D image. However, the depth information of the 3D scene is lost in the process of projecting the light rays onto the 2D plane, and the depth information proves to be capable of effectively improving the imaging capability, which breaks the limitation of the traditional imaging system on the development of the computer vision technology.

Super-resolution reconstruction (SR) is a technique for recovering a high-resolution image from one or more low-resolution images. The super-resolution concept was originally proposed by Harris and Goodman for image optics in the 60 th century, and refers to restoring image information in a single image beyond the limiting resolution lost by exceeding the optical system band-limited transfer function.

Set of position coordinates in a four-dimensional light fieldCombining a set of direction coordinates->A light can be uniquely determined, thus, < >>Can be regarded as light raysIs a state space of (a). A point in the state space corresponds to a ray in the real world light field and thus may represent the light field. A two-dimensional image obtained by collecting light field data in the angular dimension of a fixed light field is called a Viewpoint Image (VI). The viewpoint image records the observation of the scene at a fixed angle, and can acquire the main texture information in the scene. A two-dimensional slice of the light field obtained by acquiring light field data in a fixed one-dimensional angular dimension and a one-dimensional spatial dimension of the light field is called an polar plane image (EPI). The same visible object point in the scene forms a continuous straight line in the EPI due to the parallax relationship between the different viewpoint images. The straight line can effectively reflect the geometric consistency of the interior of the light field image. Therefore, the super-resolution of the light field is realized based on the correlation between the polar plane images, and the spatial information and the phase information of the light field can be fully utilized to learn the whole information of the light field.

The guided filtering is a linear filtering method which can better protect edges and keep edge details while playing a filtering effect. The guide filter can calculate a linear filter coefficient by using the image to be processed and an additional guide map, and an output image q is obtained through linear operation, so that the gradient of q is similar to the guide map, and the gray scale is similar to the image to be processed. The algorithm complexity of the guided filtering is irrelevant to the size of the window, and the efficiency is obviously improved when a large-scale picture is processed. Meanwhile, the guided filtering can well overcome the gradient inversion phenomenon in bilateral filtering.

Based on the background, the invention provides a light field reconstruction model training method based on polar plane attention and related equipment. The invention can effectively utilize the optical field phase information to improve the optical field angular resolution, and the accuracy and the robustness of super-resolution reconstruction are improved. Compared with a general light field super-resolution method, the method solves the problems that the light field has low angular resolution, the light field has poor continuity in angle, so that information is lost, and super-resolution reconstruction is unstable, and is beneficial to effective super-resolution reconstruction.

The embodiment of the invention provides a light field reconstruction model training method based on polar plane attention, which can take Tensorflow (an open source machine learning library) as a realization platform. FIG. 1 is a flow chart of a polar plane attention based light field reconstruction model training method in accordance with an embodiment of the present application, and the steps involved in FIG. 1 are described below.

Step S101, acquiring a light field data set and a network reconstruction coefficient, generating polar plane image data based on the light field data set, and grouping the polar plane image data according to a specified angular direction to obtain a plurality of groups of three-dimensional light field data corresponding to the light field data set.

In an embodiment of the invention, the light field dataset is a four-dimensional dataset comprising angular dimension data and spatial dimension data. The Light Field dataset may be an existing dataset, for example, a Stanford (New) Light Field dataset may be employed. The Stanford (New) Light Field dataset contains a series of Light Field images that capture information from different perspectives in a scene so that a user can view the same scene from multiple perspectives. Specifically, stanford (New) Light Field contains a total of 12 Light Field scenes, each Light Field contains 17×17 shots, and the resolution is 1024×1024. The network reconstruction coefficients may be given by a user for determining the degree of angular resolution reconstruction.

4D light fieldWhere x and y are spatial dimensions and s and t are angular dimensions, a polar plane image (EPI) in the light field can be obtained using one spatial dimension and one angular dimension as a set of coordinates, e.g., ->Where y and t are given coordinates, which can be determined based on the spatial dimension x and the angular dimension sThe multiple polar plane images can also be given a spatial dimension x and an angular dimension s, and then the multiple polar plane images can be determined based on y and t. And performing the processing on the light field data set to obtain a plurality of groups of polar plane image data, namely generating polar plane image data based on the light field data set.

A set of polar plane image data is used to determine a polar plane image. Multiple sets of polar plane image data may be included in a set of light field data.

After the polar plane image data are determined, grouping the polar plane image data according to a specified angular direction to obtain a plurality of groups of three-dimensional light field data corresponding to the light field data set. The specified angle direction may include a first specified angle direction and/or a second specified angle direction, and the specified angle direction may be specifically specified according to actual requirements.

In this step, the 4D light field data set is converted into 3D light field data, thereby reducing the complexity of subsequent computations.

Step S102, performing a first convolution operation and a second convolution operation on the three-dimensional light field data based on the first convolution check to obtain a first convolution result and a second convolution result, and performing a third convolution operation on the three-dimensional light field data based on the second convolution check to obtain a third convolution result; wherein the second convolution kernel is determined from the first convolution kernel and the network reconstruction coefficients; the first convolution kernel is used for extracting phase characteristics of the three-dimensional light field data.

In an embodiment of the invention, the first convolution kernel is used for extracting phase characteristic data of the three-dimensional light field data. The first convolution kernel may be a 1 x 1 3d convolution kernel. The first convolution operation, the second convolution operation, and the third convolution operation may each include a plurality of convolution processes. A second convolution kernel is determined from the first convolution kernel and the network reconstruction coefficients. For example, the product of the number of channels of the first convolution kernel and the network reconstruction coefficient may be taken as the number of channels of the second convolution kernel, keeping other conditions the same as the first convolution kernel.

In the embodiment of the invention, the second convolution kernel is determined based on the first convolution kernel and the network reconstruction coefficient, so that the phase characteristics with higher angular resolution can be extracted by using the second convolution kernel, and a third convolution result is obtained.

Step S103, calculating an attention map according to the first convolution result and the second convolution result, and taking the product of the attention map and the third convolution result as polar plane attention data; wherein the attention is intended to describe the correspondence between polar plane image data in the three-dimensional light field data.

In the embodiment of the present invention, an attention map is calculated according to the first convolution result and the second convolution result, where the first convolution result may describe phase characteristics of one set of three-dimensional light field data, specifically including characteristics of a plurality of polar planes, and the second convolution result may describe phase characteristics of another set of three-dimensional light field data, specifically including characteristics of a plurality of polar planes, and the attention map is calculated based on the first convolution result and the second convolution result, so that the attention map may be used to describe a correspondence between polar plane image data in the three-dimensional light field data.

The product of the attention map and the third convolution result is then taken as polar plane attention data. Polar plane attention data is three-dimensional data.

And step S104, taking the polar plane attention data as guide map data, and carrying out guide filtering processing based on the three-dimensional light field data and the guide map data to obtain target three-dimensional light field data.

In the embodiment of the invention, the polar plane attention data is obtained through calculation in steps S101-S103, namely, a three-dimensional guide map required by guide filtering is constructed, the guide map data is obtained, the polar plane attention data is used as the guide map data, and guide filtering processing is carried out based on the three-dimensional light field data and the guide map data, so that target three-dimensional light field data is obtained. The target three-dimensional light field data is three-dimensional light field data with higher angular resolution.

And step S105, constructing training set data by utilizing the three-dimensional light field data and the target three-dimensional light field data, and performing model training by utilizing the training set data to obtain a light field super-resolution reconstruction model.

In the embodiment of the invention, a group of three-dimensional light field data and target three-dimensional light field data obtained by calculation based on the group of three-dimensional light field data are used as a group of training data in training set data. And training the neural network model by utilizing the training set data to obtain a light field super-resolution reconstruction model. Based on the light field super-resolution reconstruction model, the light field phase information can be effectively utilized to improve the light field angular resolution, and meanwhile, the accuracy and the robustness of super-resolution reconstruction are improved.

Referring to the schematic structure of the reconstructed model shown in fig. 4, in the embodiment of the present invention, a Tranformer module is included in the model, and the Tranformer is a deep learning model for Natural Language Processing (NLP).

According to the embodiment of the invention, firstly, a light field data set and a network reconstruction coefficient are obtained, polar plane image data are generated based on the light field data set, the polar plane image data are grouped according to a specified angular direction to obtain a plurality of groups of three-dimensional light field data corresponding to the light field data set, a first convolution operation and a second convolution operation are respectively carried out on the three-dimensional light field data based on a first convolution check, so that phase characteristics of the three-dimensional light field data are extracted, a first convolution result and a second convolution result are obtained, further, an attention map is calculated based on the first convolution result and the second convolution result, the attention map is used for describing a corresponding relation between the polar plane image data in the three-dimensional light field data, a third convolution result with higher angular resolution is obtained based on the network reconstruction coefficient and the second convolution check, the product of the attention map and the third convolution result is used as polar plane attention data, the polar plane attention data is used as guide map data, the three-dimensional light field data are subjected to guide filtering processing, further, the three-dimensional light field data is obtained, the three-dimensional light field data and the target three-dimensional light field data with high angular resolution are constructed, the attention map data is obtained, the super-resolution light field model is realized, and the super-resolution light field model is continuously reconstructed, and the super-resolution light field model is realized.

In one possible implementation manner, performing a first convolution operation and a second convolution operation on the three-dimensional light field data based on the first convolution kernel to obtain a first convolution result and a second convolution result, and performing a third convolution operation on the three-dimensional light field data based on the second convolution kernel to obtain a third convolution result, including: performing a first convolution operation on the three-dimensional light field data based on the first convolution check to obtain first five-dimensional feature data, performing a second convolution operation on the three-dimensional light field data based on the first convolution check to obtain second five-dimensional feature data, and performing a third convolution operation on the three-dimensional light field data based on the second convolution check to obtain third five-dimensional feature data; and converting the first five-dimensional characteristic data into a three-dimensional tensor to obtain a first convolution result, converting the second five-dimensional characteristic data into the three-dimensional tensor to obtain a second convolution result, and converting the third five-dimensional characteristic data into the three-dimensional tensor to obtain a third convolution result.

In this possible embodiment, a first convolution operation is performed on the three-dimensional light field data based on a first convolution kernel, resulting in first five-dimensional feature data, a second convolution operation is performed on the three-dimensional light field data based on the first convolution kernel, resulting in second five-dimensional feature data, converting the first five-dimensional characteristic data into a three-dimensional tensor to obtain a first convolution result, converting the second five-dimensional characteristic data into the three-dimensional tensor to obtain a second convolution result, namely respectively extracting two 5D characteristics from the sliced low-resolution 3D light field by using two 1X 1D convolutions ，/>. Wherein extracting five-dimensional features specifically includes extracting a 3D light field from the 4D light field data and increasing the number of training samples using a shearing operation. The resulting 3D light field is then cropped into a sub-light field with a spatial resolution of 64 x 24 and a step size of 40 pixels. Two 1 x 1D convolutions are used to extract two 5D features from the sub-light field respectively. Then, the angle and width dimensions in their 5D features are merged together, remodelled into 3D tensors, and dividedAnd respectively obtaining a first convolution result and a second convolution result.

Performing a third convolution operation on the three-dimensional light field data based on the second convolution kernel to obtain third five-dimensional feature data, and converting the third five-dimensional feature data into a three-dimensional tensor to obtain a third convolution result, wherein the light field feature is implemented in a specific implementation mannerFrom one and->，/>Different 1 x 1 convolutions are extracted from the 3D light field, which convolutions differ in the number of channels +.>Wherein->The coefficients, i.e. the super resolution coefficients, are reconstructed for the network. Remodelling ∈>A three-dimensional tensor, i.e. a third convolution result, is obtained. It should be noted that, the larger the value of the network reconstruction coefficient, the higher the resolution of the reconstruction result, but the larger the network reconstruction coefficient value will also increase the complexity and the calculation cost of the network, so the network reconstruction coefficient value can be determined according to the actual requirement and the preference.

In a possible implementation manner, the first five-dimensional feature data, the second five-dimensional feature data and the third five-dimensional feature data respectively include: training sample number dimension data, height dimension data, width dimension data, angle dimension data and convolution kernel channel data; converting the first five-dimensional feature data into a three-dimensional tensor to obtain a first convolution result, converting the second five-dimensional feature data into the three-dimensional tensor to obtain a second convolution result, and converting the third five-dimensional feature data into the three-dimensional tensor to obtain a third convolution result, wherein the method comprises the following steps: determining first dimension data of a first convolution result by using training sample number dimension data and height dimension data of the first five-dimensional feature data, determining second dimension data of the first convolution result by using width dimension data and angle dimension data of the first five-dimensional feature data, and taking convolution kernel channel data of the first five-dimensional feature data as third dimension data of the first convolution result; determining first dimension data of a second convolution result by using training sample number dimension data and height dimension data of the second five-dimension feature data, and taking convolution kernel channel data of the second five-dimension feature data as second dimension data of the second convolution result; determining third dimension data of a second convolution result by utilizing the width dimension data and the angle dimension data of the second five-dimension characteristic data; determining first dimension data of a third convolution result by using training sample number dimension data and height dimension data of the third five-dimensional feature data, determining second dimension data of the third convolution result by using width dimension data and angle dimension data of the third five-dimensional feature data, and taking the product of convolution kernel channel data of the third five-dimensional feature data and the network reconstruction coefficient as third dimension data of the third convolution result.

In this possible embodiment, it will，/>Remodelling into two 3D tensors respectively>And->The sizes are respectivelyAnd->. Will->，/>Remodelling to +.>And->Tensors, i.e. combining their angle and width dimensions together, remodelling to a 3D tensor +.>And->。

Wherein B represents the batch size and represents the number of samples input in one training; h represents the spatial height, representing the number of pixels of the light field image in the vertical direction; w represents a space width, representing the number of pixels of the light field image in the horizontal direction; a represents angular resolution and the angular quantity of the light field image in the angular direction; c represents the number of channels;the transformation of the number of the channels is represented, and the number of the channels after convolution, deconvolution and other operations are represented; BH represents the product of the space height and the batch size, i.e. the total number of samples. In the above formula, BH can be seen to appear in the shape of the tensor, the first and second dimensions representing this tensor are of sizes B and H, respectively, which are multiplied to give the total number of samples. WA represents the angular resolution, i.e. the number of angles of the light field image in the angular direction. W denotes a spatial width, a denotes an angular resolution, and WA denotes a product of the spatial width and the angular resolution, i.e., the number of pixels of the light field image in the horizontal direction multiplied by the angular resolution, indicating the total number of viewing angles. The third and fourth dimensions representing this tensor are of size W and a, respectively, which are multiplied to give the total number of views.

The angle dimension s and the width dimension are formed by the above methodThe x or angular dimension t and the width dimension y are fused together. Tensor of 5D features，/>Remodelling to 3D tensor->And->The objective is to merge the angular and width dimensions (s and x or t and y in the light field) together to achieve non-local perception in the polar plane. This allows for +.>And->Batch matrix multiplication is performed between them, and a attention map M is generated using a softmax function, which captures the correspondence along the angular dimension.

In one possible implementation, calculating an attention map from the first convolution result and the second convolution result includes: and calculating a matrix product between the first convolution result and the second convolution result, and carrying out normalization processing on the matrix product to obtain an attention map.

In this possible embodiment, the control unit, in particular, for example,and->Representing a first convolution result and said second convolution result, respectively,>representing the third convolution result, ">Representing polar plane attention data.

First, inAnd->Matrix multiplication at the patch (a small region or subset of the image) level is applied between, and the attention map M is generated using the Softmax function to achieve non-local perception on the polar plane. Note that the force diagram M consists of BH matrices of size wa×wa. Neglecting dimensions B and H, each matrix can be seen as a four-dimensional tensor ++ >Is a two-dimensional expanded view of (a). Note that the force map M is able to capture the correspondence between all views in the input three-dimensional light field.

Then, willMultiplying the attention map M to obtain a three-dimensional tensor->. This three-dimensional tensor is the output of the attention block whose channel dimension expands to +.>The multiple, i.e. the super resolution coefficient. The polar plane attention calculation process is as follows:

wherein->Representation->And->The number of columns of the matrix. In this formula, the inner product of each row of vectors of matrices Q and K is calculated for protectionExcessive internal accumulation is stopped, so it is divided by +.>Is equivalent to a scaling factor.

To speed up the computation, the present invention computes polar plane attention on only a single channel. Since the polar plane attention is calculated only on gray scale, in order to be able to super-resolution on color images, the invention upsamples the low resolution 3D light field on the RGB channels and then uses the polar plane attention to guide the filtering of the upsampled 3D light field. Using guided filtering can result in a more accurate super-resolution image, reducing the gradient loss caused by filtering. Thus, in one possible implementation, the polar plane attention data is taken as guide map data, and guide filtering processing is performed based on the three-dimensional light field data and the guide map data, including: up-sampling the three-dimensional light field data in RGB channels to obtain first color characteristic data, second color characteristic data and third color characteristic data; respectively taking the first color characteristic data, the second color characteristic data and the third color characteristic data as original image data; and conducting guide filtering processing based on the original image data and the guide image data respectively.

In this possible implementation manner, the three-dimensional light field data are respectively up-sampled in the RGB channels to obtain first color feature data, second color feature data and third color feature data, that is, the 3D CNN is used to respectively up-sample the input 3D light field in the RGB channels to obtain 3 light field features，/>，/>. Then, the same as described above->And->Is remodeled as +.>3D tensor->，/>，/>。

Taking the polar plane attention data as guide map data, and taking the first color feature data, the second color feature data and the third color feature data as original image data respectively; conducting a pilot filtering process based on the original image data and the pilot image data, respectively, that is, conducting a pilot filtering calculation for each channel, and using a pilot filter to obtain polar plane attention data when implementing the processTo guide the graph I, < >>，/>，/>The original image P is guided filtered. The pilot filter forms a window per pixel k>Wherein the wave window->Is a local area, like a square or rectangular window, defining a neighborhood around each pixel of the image. Within this window, the guided filtering algorithm will be correct Each pixel is processed. Within this window, the filter computes a local linear model for predicting the output value of each pixel. Then, a linear coefficient is generated using the guidance map, and the original image is subjected to linear transformation. In window->The cost function of the pilot filter is as follows:

wherein,is->Pixels in->For pixels in the original image +.>，/>Is a parameter of local linear transformation, +.>Is a control variable for preventing division by 0.

Solving the cost function using least squares to obtain weightsAnd linear distance->The expression is:

the guided filtering averages the weights of all windows containing a pixel over that pixel to obtain an output image. The output of the pilot filter of the embodiment of the invention is:

where N is the number of pixels in the window,，/>is the coefficient obtained after taking the mean value, +.>，/>Is the coefficient solved by the cost function.

Since the data output by the pilot filter is subjected to processing such as conversion from three-dimensional light field data into 5-dimensional data and then into 3-dimensional tensor, the data output by the pilot filter needs to be subjected to inverse processing, that is, channel-to-angle super-resolution reconstruction, and 3 light field features output by the pilot filter are respectively deformed into 3 sizes by channel-to-angle pixel transformation (pixelshoffling) 5D tensor->. Finally, dimension reduction of the channel is performed by using 1 multiplied by 7 convolution to generate a light field, and then the light fields of the 3 different color channels are combined together to generate a high-resolution light field, so that the target three-dimensional light field data is obtained.

In this possible embodiment, the guided filtering can sufficiently preserve edge detail information while fusing spatial and angular information for super resolution. In addition, compared with general bilateral filtering, the efficiency of the guided filtering is obviously improved when processing a large-scale picture. At the same time, the guided filtering can well overcome gradient inversion (gradient reversal) that occurs in bilateral filtering. Because the guiding filtering is based on the premise that the output image and the guiding image have a linear relation, the consistency of the gradient can be ensured, and for bilateral filtering, the place with large gradient change has no similar pixels around, the weight of the Gaussian function is unstable, and the final gradient is inverted.

In one possible implementation, the grouping of the polar plane image data according to a specified angular direction may be performed as follows: grouping the polar plane image data according to a first specified angular direction and grouping the polar plane image data according to a second specified angular direction; wherein the first specified angular direction and the second specified angular direction are mutually perpendicular. For example, for ease of computation, for a 4D light field Along a fixed angular direction->Or->The light field is truncated, resulting in a 3D slice of the 4D light field in the horizontal and vertical angular dimensions. In this possible embodiment, the Light Field in the Stanford (New) Light Field dataset can be sliced in the s, t direction, resulting in 17 horizontal 3D Light fields->And 17 vertical 3D light fields +.>：

In one possible implementation manner, in order to increase the number of data sets, the slicing result may be further subjected to data enhancement processing, therebyThe parallax of the light field is increased. The expression for data enhancement is:the method comprises the steps of carrying out a first treatment on the surface of the Wherein d represents the shearing quantity of the light field image in the s direction, and more training samples are generated by conducting shearing operation on the light field image in the data enhancement process, so that the generalization capability of the model is improved. The value of d can be set according to the actual requirement, and through experiments, d is set as two shearing amounts +.>Better results are obtained, wherein positive and negative values indicate the direction of clipping, i.e. clipping to the left or right. s and t respectively represent two directions in the light field image, wherein the s direction corresponds to the horizontal direction and the t direction corresponds to the vertical direction. The value ranges of the two directions are respectively +.>Andwhere S and T represent the horizontal and vertical angular resolutions, respectively, of the light field image, which values can be acquired by the light field camera.

The embodiment of the invention can be implemented according to a flow chart shown in fig. 2, and fig. 3 shows an effect diagram of the embodiment of the invention. The training and testing process is described in detail below.

In the embodiment of the invention, the reconstruction coefficient is setThe patch size is set to 28. Usage variance +.>The gaussian distribution initialization model with a mean value of 0. In training, the model was optimized using Adam (Adaptive Moment Estimation ) method, and the learning rate was set to +.>The exponential decay Rate is set to +.>，/>Total training times of +.>。

In order to accelerate the calculation process, the invention converts the 3-channel color image into a gray scale map, and the polar plane transducer calculates only gray scale attention. Meanwhile, the polar plane transducer extracts the light field characteristics and simultaneously downsamples the light field characteristics, so that the size of the light field characteristics is as followsHere, B is a lot size, and is the number of samples selected for one training.

After training, the present invention performs tests on the MPI Light Field Archive dataset (one light field dataset) and the CIVIT (City of Vienna Image Texture) dataset (one study dataset for texture analysis). The MPI Light Field Archive dataset had a light field angular resolution of 1×97, a spatial resolution of 960×720, a civit light field angular resolution of 1×193, and a spatial resolution of 1280×720. At the time of testing, the angles of the data sets were downsampled 8-fold and 16-fold, respectively, for ease of calculation.

The embodiment of the invention adopts peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) as quantitative evaluation standard vertebrae to compare with an advanced method. In the quantitative evaluation, the model proposed by the present invention is compared with HDDRNet (a deep neural network model for high dynamic range (High Dynamic Range, HDR) image processing), LLFF (a deep learning method for synthesizing a High Dynamic Range (HDR) or light field image from multiple images), DA2N (a deep learning model for depth estimation), and the like, and the comparison results are shown in fig. 5 and 6.

Finally, the average PSNR over both data sets for the present embodiment is 44.37dB, while HDDRNet, LLFF, DA N is 41.36, 41.70, 42.82dB, respectively. The average SSIM of the present examples was 0.996 and HDDRNet, LLFF, DA N was 0.985, 0.995, respectively. The embodiment of the invention is at least nearly 2dB higher in PSNR than other methods, and the performance of the invention is better than that of the compared methods.

The invention provides a light field reconstruction model training method based on polar plane attention and related equipment. The technical effects include: first, the invention can stably perform angle super-resolution on the optical field image and has good shielding resistance. Second, guided filtering can reconstruct a high angular resolution light field using polar plane attention data in an efficient manner. The guided filtering can fully retain detailed information while performing efficient super-resolution reconstruction, yielding more accurate results. In conclusion, the method has higher accuracy and robustness, is simple to realize and has higher calculation speed.

Based on the above light field reconstruction model training method based on polar plane attention provided by the embodiment of the present invention, the embodiment of the present invention further provides a light field reconstruction method based on polar plane attention, which includes: acquiring light field data to be processed; processing the light field data to be processed by utilizing a light field reconstruction model based on polar plane attention to obtain a light field reconstruction result; the light field reconstruction model based on the polar plane attention is obtained according to the method.

Based on the above light field reconstruction model training method based on polar plane attention provided by the embodiment of the present invention, the embodiment of the present invention further provides a light field reconstruction model training device based on polar plane attention, including: the data module is used for acquiring a light field data set and a network reconstruction coefficient, generating polar plane image data based on the light field data set, and grouping the polar plane image data according to a specified angular direction to obtain a plurality of groups of three-dimensional light field data corresponding to the light field data set; the convolution module is used for respectively executing a first convolution operation and a second convolution operation on the basis of the first convolution check and the three-dimensional light field data to obtain a first convolution result and a second convolution result, and executing a third convolution operation on the basis of the second convolution check and the three-dimensional light field data to obtain a third convolution result; wherein the second convolution kernel is determined from the first convolution kernel and the network reconstruction coefficients; the first convolution kernel is used for extracting phase characteristics of the three-dimensional light field data; an attention module for calculating an attention map from the first convolution result and the second convolution result, taking the product of the attention map and the third convolution result as polar plane attention data; wherein the attention is intended to describe correspondence between polar plane image data in the three-dimensional light field data; the guide filtering module is used for taking the polar plane attention data as guide map data, and performing guide filtering processing based on the three-dimensional light field data and the guide map data to obtain target three-dimensional light field data; the training module is used for constructing training set data by utilizing the three-dimensional light field data and the target three-dimensional light field data, and performing model training by utilizing the training set data to obtain a light field super-resolution reconstruction model.

In one possible embodiment, the polar plane attention data is used as guide map data, and the guide filtering processing is performed based on the three-dimensional light field data and the guide map data, including: up-sampling the three-dimensional light field data in RGB channels to obtain first color characteristic data, second color characteristic data and third color characteristic data; respectively taking the first color characteristic data, the second color characteristic data and the third color characteristic data as original image data; and conducting guide filtering processing based on the original image data and the guide image data respectively.

In one possible embodiment, grouping the polar plane image data according to a specified angular direction includes: grouping the polar plane image data according to a first specified angular direction and grouping the polar plane image data according to a second specified angular direction; wherein the first specified angular direction and the second specified angular direction are mutually perpendicular.

The embodiment of the invention also provides electronic equipment, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor, which when executed by the at least one processor is adapted to cause an electronic device to perform a method of an embodiment of the invention.

The embodiments of the present invention also provide a non-transitory machine-readable medium storing a computer program, wherein the computer program is configured to cause a computer to perform the method of the embodiments of the present invention when executed by a processor of the computer.

The embodiments of the present invention also provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is for causing the computer to perform the method of the embodiments of the present invention.

With reference to fig. 7, a block diagram of an electronic device that may be a server or a client of an embodiment of the present invention will now be described, which is an example of a hardware device that may be applied to aspects of the present invention. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 7, the electronic device includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

A number of components in the electronic device are connected to the I/O interface 705, including: an input unit 706, an output unit 707, a storage unit 708, and a communication unit 709. The input unit 706 may be any type of device capable of inputting information to an electronic device, and the input unit 706 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 707 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 708 may include, but is not limited to, magnetic disks, optical disks. The communication unit 709 allows the electronic device to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a CPU, a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above. For example, in some embodiments, method embodiments of the present invention may be implemented as a computer program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device via the ROM 702 and/or the communication unit 709. In some embodiments, the computing unit 701 may be configured to perform the methods described above by any other suitable means (e.g., by means of firmware).

A computer program for implementing the methods of embodiments of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of embodiments of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable signal medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It should be noted that the term "comprising" and its variants as used in the embodiments of the present invention are open-ended, i.e. "including but not limited to". The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. References to "one or more" modifications in the examples of the invention are intended to be illustrative rather than limiting, and it will be understood by those skilled in the art that "one or more" is intended to be interpreted as "one or more" unless the context clearly indicates otherwise.

User information (including but not limited to user equipment information, user personal information and the like) and data (including but not limited to data for analysis, stored data, presented data and the like) according to the embodiment of the invention are information and data authorized by a user or fully authorized by all parties, and the collection, use and processing of related data are required to comply with related laws and regulations and standards of related countries and regions, and are provided with corresponding operation entrances for users to select authorization or rejection.

The steps described in the method embodiments provided in the embodiments of the present invention may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.

The term "embodiment" in this specification means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. The various embodiments in this specification are described in a related manner, with identical and similar parts being referred to each other. In particular, for apparatus, devices, system embodiments, the description is relatively simple as it is substantially similar to method embodiments, see for relevant part of the description of method embodiments.

The above examples merely represent a few embodiments of the present invention, which are described in more detail and are not to be construed as limiting the scope of the patent claims. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of the invention should be assessed as that of the appended claims.

Claims

1. The light field reconstruction model training method based on the polar plane attention is characterized by comprising the following steps of:

acquiring a light field data set and a network reconstruction coefficient, generating polar plane image data based on the light field data set, and grouping the polar plane image data according to a specified angular direction to obtain a plurality of groups of three-dimensional light field data corresponding to the light field data set;

performing a first convolution operation and a second convolution operation on the three-dimensional light field data based on the first convolution check to obtain a first convolution result and a second convolution result, and performing a third convolution operation on the three-dimensional light field data based on the second convolution check to obtain a third convolution result; wherein the second convolution kernel is determined from the first convolution kernel and the network reconstruction coefficients; the first convolution kernel is used for extracting phase characteristics of the three-dimensional light field data;

Calculating an attention map from the first convolution result and the second convolution result, taking the product of the attention map and the third convolution result as polar plane attention data; wherein the attention is intended to describe correspondence between polar plane image data in the three-dimensional light field data;

taking the polar plane attention data as guide map data, and performing guide filtering processing based on the three-dimensional light field data and the guide map data to obtain target three-dimensional light field data;

constructing training set data by utilizing the three-dimensional light field data and the target three-dimensional light field data, and performing model training by utilizing the training set data to obtain a light field super-resolution reconstruction model.

2. The method of claim 1, wherein performing a first convolution operation and a second convolution operation on the three-dimensional light field data based on a first convolution kernel to obtain a first convolution result and a second convolution result, and performing a third convolution operation on the three-dimensional light field data based on a second convolution kernel to obtain a third convolution result, comprises:

performing a first convolution operation on the three-dimensional light field data based on the first convolution check to obtain first five-dimensional feature data, performing a second convolution operation on the three-dimensional light field data based on the first convolution check to obtain second five-dimensional feature data, and performing a third convolution operation on the three-dimensional light field data based on the second convolution check to obtain third five-dimensional feature data;

And converting the first five-dimensional characteristic data into a three-dimensional tensor to obtain a first convolution result, converting the second five-dimensional characteristic data into the three-dimensional tensor to obtain a second convolution result, and converting the third five-dimensional characteristic data into the three-dimensional tensor to obtain a third convolution result.

3. The method for training a light field reconstruction model based on polar plane attention according to claim 2, wherein the first five-dimensional feature data, the second five-dimensional feature data and the third five-dimensional feature data respectively comprise: training sample number dimension data, height dimension data, width dimension data, angle dimension data and convolution kernel channel data; converting the first five-dimensional feature data into a three-dimensional tensor to obtain a first convolution result, converting the second five-dimensional feature data into the three-dimensional tensor to obtain a second convolution result, and converting the third five-dimensional feature data into the three-dimensional tensor to obtain a third convolution result, wherein the method comprises the following steps:

determining first dimension data of a first convolution result by using training sample number dimension data and height dimension data of the first five-dimensional feature data, determining second dimension data of the first convolution result by using width dimension data and angle dimension data of the first five-dimensional feature data, and taking convolution kernel channel data of the first five-dimensional feature data as third dimension data of the first convolution result;

Determining first dimension data of a second convolution result by using training sample number dimension data and height dimension data of the second five-dimension feature data, and taking convolution kernel channel data of the second five-dimension feature data as second dimension data of the second convolution result; determining third dimension data of a second convolution result by utilizing the width dimension data and the angle dimension data of the second five-dimension characteristic data;

determining first dimension data of a third convolution result by using training sample number dimension data and height dimension data of the third five-dimensional feature data, determining second dimension data of the third convolution result by using width dimension data and angle dimension data of the third five-dimensional feature data, and taking the product of convolution kernel channel data of the third five-dimensional feature data and the network reconstruction coefficient as third dimension data of the third convolution result.

4. The polar plane attention based light field reconstruction model training method of claim 1, wherein computing an attention map from the first convolution result and the second convolution result comprises:

and calculating a matrix product between the first convolution result and the second convolution result, and carrying out normalization processing on the matrix product to obtain an attention map.

5. The training method of a light field reconstruction model based on polar plane attention according to claim 1, wherein the polar plane attention data is used as guide map data, and the guide filtering process is performed based on the three-dimensional light field data and the guide map data, and the method comprises:

up-sampling the three-dimensional light field data in RGB channels to obtain first color characteristic data, second color characteristic data and third color characteristic data;

respectively taking the first color characteristic data, the second color characteristic data and the third color characteristic data as original image data;

and conducting guide filtering processing based on the original image data and the guide image data respectively.

6. The method of training a polar plane attention based light field reconstruction model of claim 1, wherein grouping the polar plane image data according to a specified angular direction comprises:

grouping the polar plane image data according to a first specified angular direction and grouping the polar plane image data according to a second specified angular direction; wherein the first specified angular direction and the second specified angular direction are mutually perpendicular.

7. A method of light field reconstruction based on polar plane attention, comprising:

acquiring light field data to be processed;

processing the light field data to be processed by utilizing a light field reconstruction model based on polar plane attention to obtain a light field reconstruction result; wherein the polar plane attention based light field reconstruction model is obtained according to the method of any one of claims 1-6.

8. A polar plane attention-based light field reconstruction model training device, comprising:

the data module is used for acquiring a light field data set and a network reconstruction coefficient, generating polar plane image data based on the light field data set, and grouping the polar plane image data according to a specified angular direction to obtain a plurality of groups of three-dimensional light field data corresponding to the light field data set;

the convolution module is used for respectively executing a first convolution operation and a second convolution operation on the basis of the first convolution check and the three-dimensional light field data to obtain a first convolution result and a second convolution result, and executing a third convolution operation on the basis of the second convolution check and the three-dimensional light field data to obtain a third convolution result; wherein the second convolution kernel is determined from the first convolution kernel and the network reconstruction coefficients; the first convolution kernel is used for extracting phase characteristics of the three-dimensional light field data;

An attention module for calculating an attention map from the first convolution result and the second convolution result, taking the product of the attention map and the third convolution result as polar plane attention data; wherein the attention is intended to describe correspondence between polar plane image data in the three-dimensional light field data;

the guide filtering module is used for taking the polar plane attention data as guide map data, and performing guide filtering processing based on the three-dimensional light field data and the guide map data to obtain target three-dimensional light field data;

the training module is used for constructing training set data by utilizing the three-dimensional light field data and the target three-dimensional light field data, and performing model training by utilizing the training set data to obtain a light field super-resolution reconstruction model.

9. An electronic device, comprising: a processor, and a memory storing a program, wherein the program comprises instructions that when executed by the processor cause the processor to perform the method of any of claims 1-7.

10. A non-transitory machine readable medium having stored thereon computer instructions for causing the computer to perform the method according to any of claims 1-7.