CN113506336B - Light field depth prediction method based on convolutional neural network and attention mechanism - Google Patents

Light field depth prediction method based on convolutional neural network and attention mechanism Download PDF

Info

Publication number
CN113506336B
CN113506336B CN202110732927.4A CN202110732927A CN113506336B CN 113506336 B CN113506336 B CN 113506336B CN 202110732927 A CN202110732927 A CN 202110732927A CN 113506336 B CN113506336 B CN 113506336B
Authority
CN
China
Prior art keywords
light field
layer
module
attention
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110732927.4A
Other languages
Chinese (zh)
Other versions
CN113506336A (en
Inventor
张倩
杜昀璋
刘敬怀
花定康
王斌
朱苏磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Normal University
Original Assignee
Shanghai Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Normal University filed Critical Shanghai Normal University
Priority to CN202110732927.4A priority Critical patent/CN113506336B/en
Publication of CN113506336A publication Critical patent/CN113506336A/en
Application granted granted Critical
Publication of CN113506336B publication Critical patent/CN113506336B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/557Depth or shape recovery from multiple images from light fields, e.g. from plenoptic cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a light field depth prediction method based on a convolutional neural network and an attention mechanism, which comprises the following steps: acquiring a light field image and preprocessing the light field image to generate a light field image set; constructing a light field depth prediction model, wherein the model comprises an EPI learning module, an attention module and a feature fusion module; respectively inputting the light field image set into an EPI learning module and an attention module, and respectively acquiring EPI information of the light field image and each image weight; and respectively inputting the EPI information of the light field images and the weights of the images into a feature fusion module to obtain a light field depth prediction result. Compared with the prior art, the method has the advantages of high prediction precision, good practicability and the like.

Description

Light field depth prediction method based on convolutional neural network and attention mechanism
Technical Field
The invention relates to the technical field of light field depth estimation, in particular to a light field depth prediction method based on a convolutional neural network and an attention mechanism.
Background
The light field depth information reflects precise spatial information of the corresponding target. Scene depth acquisition is a technical key for determining whether a light field image can be widely applied, and is also one of research hotspots in the fields of computer vision and the like. The method plays an important role in the fields of three-dimensional reconstruction, target identification, automatic driving of automobiles and the like.
Currently, light field depth estimation algorithms are largely divided into non-learning-based methods and learning-based methods. The non-learning method mainly comprises a focusing and defocusing fusion method and a stereo matching-based method. The focusing and defocusing fusion method obtains corresponding depth by measuring the ambiguity of different Jiao Dui pixels, and the depth map obtained by the method can keep more details, but can introduce defocusing errors and reduce the accuracy of the depth map.
In recent years, deep learning has achieved tremendous achievement in the field of light field depth estimation, as disclosed in chinese patent CN112785637a, a light field depth estimation method based on a dynamic fusion network, comprising the steps of: determining a light field data set, determining a training set and a testing set based on the light field data set; expanding the light field dataset; building a dynamic fusion network model; the dynamic fusion network model consists of a double-flow network and a multi-mode dynamic fusion module; the dual-flow network consists of an RGB flow and a focal stack flow; taking the output global RGB features and focus features of the double-flow network as inputs of a multi-mode dynamic fusion module, and outputting a final depth map; training the constructed dynamic fusion network model based on the training set; and testing the trained dynamic fusion network model on the test set, and verifying on the mobile phone data set. The light field depth estimation method in the patent can obtain accuracy better than other light field depth estimation methods, reduce noise, retain more detail information, break the limit of a light field camera and be successfully applied to common consumer-grade camera data, but the light field depth estimation method in the patent does not fully consider the geometric characteristics of light field images, and the accuracy of prediction is not high.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide the light field depth prediction method based on the convolutional neural network and the attention mechanism, which has high prediction precision and good practicability.
The aim of the invention can be achieved by the following technical scheme:
a light field depth prediction method based on a convolutional neural network and an attention mechanism comprises the following steps:
step 1: acquiring a light field image and preprocessing the light field image to generate a light field image set;
step 2: constructing a light field depth prediction model, wherein the model comprises an EPI learning module, an attention module and a feature fusion module;
Step 3: inputting the light field image set obtained in the step 1 into an EPI learning module and an attention module respectively, and obtaining EPI information of the light field image and each image weight respectively;
Step 4: and respectively inputting the EPI information of the light field images and the weights of the images into a feature fusion module to obtain a light field depth prediction result.
Preferably, the preprocessing of the light field image in the step 1 specifically includes: and performing data enhancement operation on the light field image.
Preferably, the EPI learning module specifically includes:
Parallel EPI learning networks are respectively arranged at four angles of 0 degree, 45 degree, 90 degree and 135 degree, and each of the four parallel EPI learning networks comprises a two-dimensional convolution layer, an activation layer, a two-dimensional convolution layer, an activation layer and a batch normalization layer which are sequentially connected.
More preferably, the loss function of the EPI learning network is:
Wherein L is a return loss value, N is a total sample amount, and x and y are predicted outputs respectively.
More preferably, the activating layer specifically comprises: sigmoid function.
Preferably, the attention module comprises a two-dimensional convolution layer, resblock, a feature extraction layer, a Cost volume layer, a pooling layer, a full connection layer and an activation layer which are connected in sequence.
More preferably, the feature extraction layer specifically includes: the spatial pyramid pooling layer.
Preferably, the step 2 further includes verifying the optical field depth prediction model during training.
More preferably, the verification method is as follows:
first, the mean square error MSE of the light field depth prediction result and ground truth is calculated:
Wherein N is the total number of pixels in the light field image; dep and GT are the light field depth prediction result and ground truth respectively; i is each pixel in the light field image;
secondly, calculating peak signal-to-noise ratio PSNR:
wherein MAX is the maximum value of pixels in the light field image;
Then, the structural similarity index SSIM is calculated:
Wherein x and y are the light field depth prediction result and ground truth respectively; μ is an average value of the light field image pixel values; σ x 2 and σ y 2 are the variances of the corresponding images, respectively; σ x,y is the covariance of x and y;
And finally judging whether MSE, PSNR and SSIM are all within a preset threshold, if so, completing training of the model, otherwise, continuing training of the model.
Preferably, the feature fusion module comprises 8 convolution blocks and 1 optimization block which are connected in sequence; the optimization block comprises two-dimensional convolution layers and an activation layer.
Compared with the prior art, the invention has the following beneficial effects:
1. The prediction precision is high: the light field depth prediction method fully considers the geometric characteristics of the light field image, fully utilizes the angular characteristics and symmetry of the light field image, improves the accuracy of depth estimation, and can provide more accurate results under the same working time length and working conditions.
2. The practicability is good: the light field depth prediction method provided by the invention does not depend on precise equipment such as radars, antennas and the like, can conveniently acquire the required depth information, and has strong practicability.
Drawings
FIG. 1 is a flow chart of a light field depth prediction method according to the present invention;
FIG. 2 is a schematic diagram of a light field depth prediction model according to the present invention;
fig. 3 is a schematic diagram of three modes of the attention module according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
A light field depth prediction method based on a convolutional neural network and an attention mechanism comprises the following steps:
step 1: acquiring a light field image and preprocessing the light field image to generate a light field image set;
Acquisition of a light field image: with the gradual maturation of light field imaging technology, consumer-level light field cameras are applied in a large scale, the light field cameras can be used for acquiring position and direction information rich in light rays in a scene, and the depth information of the scene can be further acquired by analyzing and processing the information by a passive depth estimation method. The light field camera can obtain four-dimensional light field information after one shooting, namely, obtain scene images with multiple visual angles. These images form an array of 9 x 9 images of 81 total, with the relative position of each picture in the array being fixed. The difference between the relative positions of each picture (i.e. the baseline) and the position difference between the corresponding same spatial point in each picture (i.e. the disparity) are calculated. And obtaining the distance between the corresponding point in the space and the visual angle of the camera center lens by calculating the relation between the base line and the parallax.
Because the acquisition of the light field image needs to use a certain professional device, such as a fixed camera array, a camera portal frame or a light field camera, the problem of insufficient image data calculated under the same scene can sometimes occur, and the data is preprocessed through data enhancement aiming at the problem in actual operation.
Under the condition of keeping the geometric relation among the sub-aperture images in the light field unchanged, transforming limited data to enlarge the available data scale, wherein the data enhancement operation in the embodiment comprises the following steps:
1. Light field image with center viewpoint transferred
The acquired light field data has 9×9 views, the spatial resolution of each view is 512×512, and more than 9 times of data which can be used for training can be obtained by translating on a 9×9 array by using a window with the size of 7×7;
2. Angle of change
The new data for training can be directly obtained by a rotating mode, or the polar surface characteristics of the viewpoints can be extracted, the sub-aperture images are first rotated, and then the viewpoint images are rearranged for connection.
3. Scaling and flipping
It should be noted that, while the image is enlarged or reduced, the disparity value is also transformed accordingly.
The above three methods can operate in various dimensions of the image, such as center view, image size, image RGB values, image random color transforms, image gray values, gamma values, etc.
Step 2: constructing a light field depth prediction model shown in fig. 2, wherein the model comprises an EPI learning module, an attention module and a feature fusion module;
The construction method of the EPI learning module comprises the following steps:
The four-dimensional light field image may be represented as L (x, y, u, v), where (x, y) is the spatial resolution and (u, v) is the angular resolution, and the relationship of the light field image center to other viewpoints may be represented as:
L(x,y,0,0)=L(x+d(x,y)*u,y+d(x,y)*v,u,v)
Where d (x, y) is the parallax of the center viewpoint pixel (x, y) and the pixel corresponding to the neighboring viewpoint.
For an angular direction θ (tan θ=v/u), the following relationship is reestablished:
L(x,y,0,0)=L(x+d(x,y)*u,y+d(x,y)*utanθ,u,utanθ)
Wherein, the light field view point is a tidy 9×9 array, and the corresponding view point can be ensured only when utan theta is an integer. Thus, the image angles 0 °, 45 °, 90 °, and 135 ° for the four viewpoint directions are selected, and it can be assumed that the angular resolution of the light field image is (2n+1) × (2n+1).
Therefore, the EPI learning network is respectively provided with parallel EPI learning networks at four angles of 0 °, 45 °, 90 ° and 135 °, and respectively performs feature extraction on the light field image data. The four parallel EPI learning networks each include a two-dimensional convolution layer 2D Conv, an activation layer Relu, a two-dimensional convolution layer 2D Conv, an activation layer Relu, and a bulk normalization layer BN, which are connected in sequence.
In the two-dimensional convolution layer 2D Conv, a and B are two-dimensional matrices, and the convolution result is:
C(j,l)=∑pq A(p,q)B(j-p+1,k-q+1)
the sigmoid function is adopted at the activation layer Relu, specifically:
the activation function introduces nonlinear output into the output z of the upper layer neurons, phi (z) is the output of the next layer, relu avoids the problems of gradient explosion and gradient disappearance to a certain extent.
Since the deep neural network is multi-layered, the learning speed is reduced. To prevent the change in the lower layer input from becoming larger or smaller, the upper layer is caused to fall into the saturation region, so that learning is stopped prematurely. After the last activation of the functional layer, batch normalization (BN, batch normalization) is selected. The batch normalization layer BN is specifically:
Wherein μ is a conversion parameter, σ is a scaling parameter, and the two parameters are used for converting and scaling data so that the data accords with a standard distribution with an average value of 0 and a variance of 1; b is a reconversion parameter and g is a rescaling parameter to ensure that the expressive power of the model is not reduced by normalization.
The loss function of the EPI learning network is:
Wherein L is a return loss value, N is a total sample amount, and x and y are predicted outputs respectively.
To address the problem of too small a baseline in the light field, a small disparity value is measured using a convolution kernel of size 2x 2 with a step size of 1. The convolution depth is set to 7 in the network, and the learning rate is 1e -5.
The construction method of the attention module comprises the following steps:
A large number of pictures of different angular perspectives are acquired in the light field data. As described in the first step, depth information in a three-dimensional space can be obtained by calculating disparity information and EPI information of corresponding points in these pictures. However, the pictures contain a large amount of redundant information, so that an attention module is arranged, the pictures in the light field are calculated and assigned with weights, and the importance and contribution of the pictures with more value for estimating the depth of the light field are highlighted.
The attention module includes two-dimensional convolution layers 2D Conv, resblock, feature extraction layer FE block, cost volume layer, pooling layer Pooling, full connection layer Connected and activation layer Relu Connected in sequence, specifically:
Firstly, preprocessing an optical field image through two-dimensional convolution layers 2D Conv and Resblock, and then carrying out feature extraction in a feature extraction layer FE block to remove texture areas and non-lambertian curved surfaces. And the feature extraction layer FE block extracts features according to the connection of the adjacent areas and connects all feature maps to obtain an output feature map. Next, in the Cost volume layer, the relative positions of the feature views are adjusted, and five dimensions (batch size×gap×height×width×feature size) are calculated to connect these feature mapped Cost volumes. Finally, the input cost amounts are assembled to generate the attention graph, followed by the connection layer and the activation layer. Taking the HCI dataset as an example, there are 9 x 9 sub-aperture views in each scene, so a 9 x 9 size attention map is finally obtained. This part of the operation is divided into three steps:
first, extracting image features using a feature extraction layer
The feature extraction layer selects an SPP (SPATIAL PYRAMID pooling, space pyramid pooling) module, and the SPP module is used for estimating the disparity value by utilizing the information of the adjacent areas of the corresponding points.
The SPP module specifically comprises: in a CNN, the last pooling layer is removed and replaced with an SPP to maximize pooling (max pooling). SPP-net can be trained with standard back-production.
Second, calculate the Cost volume
And transmitting the characteristic map of each sub-aperture view through the SPP module to obtain the characteristic map of each view. To better utilize these feature maps, a calculate Cost volume is set. According to the feature map provided by the SPP module, the input image is manually moved in the u or v direction with different levels of disparity so that the second half of the network can directly view pixel information at different spatial locations using relatively small received signals. 9 parallax levels are set, ranging from-4 to 4. After moving the feature maps, the feature maps are connected to a 5D Cost volume, which is equal to the batch size x parallax x height x width x feature size.
Third step, obtaining attention force diagram
Note that the drawing is essentially a 9 x 9 drawing, which shows the importance of the corresponding drawing. The first type is a free attention map, where each view has its own importance value. Learning all images in the light field picture; the second type is the symmetrical note that the light field image array is symmetrical along the u and v axes. 25 images of which symmetry can be learned from symmetry. The entire map can be constructed by mirroring along the u-axis and v-axis; in the third type, the image is symmetrical along u, v and two diagonal axes. Again using symmetry, weights for the symmetrical 15 images are calculated and then a complete attention map is constructed by mirroring along the diagonal, v and u axes. By constraining the structure of the attention map, the number of learnable weights is reduced. With the Cost volume as input, the view selection module generates attention patterns through a global pooling layer, then a full connection layer, and finally an activation layer, thereby obtaining attention distribution patterns for all pictures of the light field image.
The attention module includes three modes, as shown in FIG. 3, in the first mode, the module performs an attention assessment for each image; in the second mode, only 0 ° and 90 ° are mirrored with the directional image; in the last mode, 45 ° and 135 ° directions are added. The three methods are combined to get the attention. Attention will be paid to combining with the convolutional layers in the neural network in the form of weights, and then the weights of the sub-aperture views are enhanced.
The construction method of the feature fusion module comprises the following steps:
The feature fusion module comprises 8 convolution blocks and 1 optimization block which are connected in sequence, wherein the optimization block comprises two-dimensional convolution layers and an activation layer.
Step 2 further includes verifying the optical field depth prediction model during training, specifically:
first, the mean square error MSE of the light field depth prediction result and ground truth is calculated:
Wherein N is the total number of pixels in the light field image; dep and GT are the light field depth prediction result and ground truth respectively; i is each pixel in the light field image;
secondly, calculating peak signal-to-noise ratio PSNR:
wherein MAX is the maximum value of pixels in the light field image;
Then, the structural similarity index SSIM is calculated:
Wherein x and y are the light field depth prediction result and ground truth respectively; μ is an average value of the light field image pixel values; σ x 2 and σ y 2 are the variances of the corresponding images, respectively; σ x,y is the covariance of x and y;
And finally judging whether MSE, PSNR and SSIM are all within a preset threshold, if so, completing training of the model, otherwise, continuing training of the model.
Step 3: inputting the light field image set obtained in the step 1 into an EPI learning module and an attention module respectively, and obtaining EPI information of the light field image and each image weight respectively;
Step 4: and respectively inputting the EPI information of the light field images and the weights of the images into a feature fusion module to obtain a light field depth prediction result.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (8)

1. The light field depth prediction method based on the convolutional neural network and the attention mechanism is characterized by comprising the following steps of:
step 1: acquiring a light field image and preprocessing the light field image to generate a light field image set;
step 2: constructing a light field depth prediction model, wherein the model comprises an EPI learning module, an attention module and a feature fusion module;
Step 3: inputting the light field image set obtained in the step 1 into an EPI learning module and an attention module respectively, and obtaining EPI information of the light field image and each image weight respectively;
Step 4: respectively inputting the EPI information of the light field images and the weights of the images into a feature fusion module to obtain a light field depth prediction result;
The attention module comprises a two-dimensional convolution layer, resblock, a feature extraction layer, a Cost volume layer, a pooling layer, a full connection layer and an activation layer which are connected in sequence; the characteristic extraction layer specifically comprises: a spatial pyramid pooling layer; specific:
Firstly, preprocessing an optical field image through two-dimensional convolution layers 2D Conv and Resblock, and then carrying out feature extraction in a feature extraction layer FE block to remove texture areas and non-lambertian curved surfaces; the feature extraction layer FE block extracts features according to the connection of the adjacent areas and connects all feature maps to obtain output feature maps; next, in the Cost volume layer, the relative position of the feature view is adjusted, and five dimensions are calculated to connect the Cost volume after the feature mapping, wherein the five dimensions are batch processing size x parallax x height x width x feature size; finally, the input cost amounts are assembled to generate an attention graph, followed by a connection layer and an activation layer; this part of the operation is divided into three steps:
first, extracting image features using a feature extraction layer
The feature extraction layer selects an SPP space pyramid pooling module, and the SPP module is used for estimating the disparity value by utilizing the information of the adjacent areas of the corresponding points;
the SPP module specifically comprises: removing the last pooling layer in a CNN, and changing the pooling layer into an SPP to perform the maximum pooling operation;
second, calculate the Cost volume
The feature map of each sub-aperture view is transmitted through an SPP module to obtain the feature map of each view; setting a calculation Cost volume, and manually moving the input image along the u or v direction at different parallax levels according to the feature map provided by the SPP module; setting 9 parallax levels, ranging from-4 to 4; after moving the feature maps, the feature maps are connected to a 5D Cost volume, which is equal to the batch size x parallax x height x width x feature size;
third step, obtaining attention force diagram
Note that the figure is essentially a 9 x 9 diagram, which indicates the importance of the corresponding view; the first type is a free attention map, where each view has its own importance value; learning all images in the light field picture; the second type is the symmetric note that the light field image array is symmetric along the u and v axes; learning 25 images of which symmetry is based on symmetry; the entire map is constructed by mirroring along the u-axis and v-axis; in the third type, the image is symmetrical along u, v and two diagonal axes; again using symmetry, calculating weights for the symmetrical 15 images, and then constructing a complete attention map by mirroring along diagonal, v and u axes; the number of the learnable weights is reduced by constraining the structure of the attention map; with the Cost volume as input, the view selection module generates attention patterns through a global pooling layer, then a full connection layer, and finally an activation layer, thereby obtaining attention distribution patterns for all pictures of the light field image.
2. The method for predicting the depth of a light field based on a convolutional neural network and an attention mechanism according to claim 1, wherein the preprocessing of the light field image in step 1 is specifically: and performing data enhancement operation on the light field image.
3. The method for predicting light field depth based on convolutional neural network and attention mechanism according to claim 1, wherein the EPI learning module specifically comprises:
Parallel EPI learning networks are respectively arranged at four angles of 0 degree, 45 degree, 90 degree and 135 degree, and each of the four parallel EPI learning networks comprises a two-dimensional convolution layer, an activation layer, a two-dimensional convolution layer, an activation layer and a batch normalization layer which are sequentially connected.
4. A method for predicting light field depth based on convolutional neural network and attention mechanism as recited in claim 3, wherein said EPI learning network has a loss function of:
Wherein L is a return loss value, N is a total sample amount, and x and y are predicted outputs respectively.
5. The method for predicting light field depth based on convolutional neural network and attention mechanism as recited in claim 3, wherein said active layer specifically comprises: sigmoid function.
6. The method of claim 1, wherein step 2 further comprises verifying the depth of field prediction model during training.
7. The method for predicting light field depth based on convolutional neural network and attention mechanism as recited in claim 6, wherein said verification method is as follows:
first, the mean square error MSE of the light field depth prediction result and ground truth is calculated:
Wherein N is the total number of pixels in the light field image; dep and GT are the light field depth prediction result and ground truth respectively; i is each pixel in the light field image;
secondly, calculating peak signal-to-noise ratio PSNR:
wherein MAX is the maximum value of pixels in the light field image;
Then, the structural similarity index SSIM is calculated:
Wherein x and y are the light field depth prediction result and ground truth respectively; μ is an average value of the light field image pixel values; σ x 2 and σ y 2 are the variances of the corresponding images, respectively; σ x,y is the covariance of x and y;
And finally judging whether MSE, PSNR and SSIM are all within a preset threshold, if so, completing training of the model, otherwise, continuing training of the model.
8. The light field depth prediction method based on the convolutional neural network and the attention mechanism according to claim 1, wherein the feature fusion module comprises 8 convolutional blocks and 1 optimization block which are connected in sequence; the optimization block comprises two-dimensional convolution layers and an activation layer.
CN202110732927.4A 2021-06-30 2021-06-30 Light field depth prediction method based on convolutional neural network and attention mechanism Active CN113506336B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110732927.4A CN113506336B (en) 2021-06-30 2021-06-30 Light field depth prediction method based on convolutional neural network and attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110732927.4A CN113506336B (en) 2021-06-30 2021-06-30 Light field depth prediction method based on convolutional neural network and attention mechanism

Publications (2)

Publication Number Publication Date
CN113506336A CN113506336A (en) 2021-10-15
CN113506336B true CN113506336B (en) 2024-04-26

Family

ID=78011428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110732927.4A Active CN113506336B (en) 2021-06-30 2021-06-30 Light field depth prediction method based on convolutional neural network and attention mechanism

Country Status (1)

Country Link
CN (1) CN113506336B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113965757A (en) * 2021-10-21 2022-01-21 上海师范大学 Light field image coding method and device based on EPI (intrinsic similarity) and storage medium
CN114511605B (en) * 2022-04-18 2022-09-02 清华大学 Light field depth estimation method and device, electronic equipment and storage medium
CN114511609B (en) * 2022-04-18 2022-09-02 清华大学 Unsupervised light field parallax estimation system and method based on occlusion perception

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846473A (en) * 2018-04-10 2018-11-20 杭州电子科技大学 Light field depth estimation method based on direction and dimension self-adaption convolutional neural networks
CN109064405A (en) * 2018-08-23 2018-12-21 武汉嫦娥医学抗衰机器人股份有限公司 A kind of multi-scale image super-resolution method based on dual path network
CN111583313A (en) * 2020-03-25 2020-08-25 上海物联网有限公司 Improved binocular stereo matching method based on PSmNet
CN111696148A (en) * 2020-06-17 2020-09-22 中国科学技术大学 End-to-end stereo matching method based on convolutional neural network
CN112287940A (en) * 2020-10-30 2021-01-29 西安工程大学 Semantic segmentation method of attention mechanism based on deep learning
CN112767466A (en) * 2021-01-20 2021-05-07 大连理工大学 Light field depth estimation method based on multi-mode information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019096310A1 (en) * 2017-11-20 2019-05-23 Shanghaitech University Light field image rendering method and system for creating see-through effects

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846473A (en) * 2018-04-10 2018-11-20 杭州电子科技大学 Light field depth estimation method based on direction and dimension self-adaption convolutional neural networks
CN109064405A (en) * 2018-08-23 2018-12-21 武汉嫦娥医学抗衰机器人股份有限公司 A kind of multi-scale image super-resolution method based on dual path network
CN111583313A (en) * 2020-03-25 2020-08-25 上海物联网有限公司 Improved binocular stereo matching method based on PSmNet
CN111696148A (en) * 2020-06-17 2020-09-22 中国科学技术大学 End-to-end stereo matching method based on convolutional neural network
CN112287940A (en) * 2020-10-30 2021-01-29 西安工程大学 Semantic segmentation method of attention mechanism based on deep learning
CN112767466A (en) * 2021-01-20 2021-05-07 大连理工大学 Light field depth estimation method based on multi-mode information

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Changha Shin,etal..EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images.《2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition》.2018,摘要,第1、3节. *
一种基于PSMNet改进的立体匹配算法;刘建国;冯云剑;纪郭;颜伏伍;朱仕卓;;华南理工大学学报(自然科学版)(01);全文 *
光场成像技术及其在计算机视觉中的应用;张驰;刘菲;侯广琦;孙哲南;谭铁牛;;中国图象图形学报(03);全文 *
杨博雄主编.《深度学习理论与实践》.北京邮电大学出版社,2020,第142-143页. *

Also Published As

Publication number Publication date
CN113506336A (en) 2021-10-15

Similar Documents

Publication Publication Date Title
CN113506336B (en) Light field depth prediction method based on convolutional neural network and attention mechanism
RU2698402C1 (en) Method of training a convolutional neural network for image reconstruction and a system for forming an image depth map (versions)
CN110036410B (en) Apparatus and method for obtaining distance information from view
CN102997891B (en) Device and method for measuring scene depth
CN110880162B (en) Snapshot spectrum depth combined imaging method and system based on deep learning
CN111028273B (en) Light field depth estimation method based on multi-stream convolution neural network and implementation system thereof
CN111709985A (en) Underwater target ranging method based on binocular vision
CN113256699B (en) Image processing method, image processing device, computer equipment and storage medium
EP3026629A1 (en) Method and apparatus for estimating depth of focused plenoptic data
CN116129037B (en) Visual touch sensor, three-dimensional reconstruction method, system, equipment and storage medium thereof
CN115082450A (en) Pavement crack detection method and system based on deep learning network
CN116778288A (en) Multi-mode fusion target detection system and method
CN114419568A (en) Multi-view pedestrian detection method based on feature fusion
CN114092540A (en) Attention mechanism-based light field depth estimation method and computer readable medium
CN113538545B (en) Monocular depth estimation method based on electro-hydraulic adjustable-focus lens and corresponding camera and storage medium
CN111325828A (en) Three-dimensional face acquisition method and device based on three-eye camera
US20220044068A1 (en) Processing perspective view range images using neural networks
CN114332796A (en) Multi-sensor fusion voxel characteristic map generation method and system
CN116883981A (en) License plate positioning and identifying method, system, computer equipment and storage medium
Alaniz-Plata et al. ROS and Stereovision Collaborative System
CN116778091A (en) Deep learning multi-view three-dimensional reconstruction algorithm based on path aggregation
CN113160416B (en) Speckle imaging device and method for coal flow detection
CN114119704A (en) Light field image depth estimation method based on spatial pyramid pooling
Sekkati et al. Direct and indirect 3-D reconstruction from opti-acoustic stereo imaging
Rasyidy et al. A Framework for Road Boundary Detection based on Camera-LIDAR Fusion in World Coordinate System and Its Performance Evaluation Using Carla Simulator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant