CN110163246A - The unsupervised depth estimation method of monocular light field image based on convolutional neural networks - Google Patents

The unsupervised depth estimation method of monocular light field image based on convolutional neural networks Download PDF

Info

Publication number
CN110163246A
CN110163246A CN201910276356.0A CN201910276356A CN110163246A CN 110163246 A CN110163246 A CN 110163246A CN 201910276356 A CN201910276356 A CN 201910276356A CN 110163246 A CN110163246 A CN 110163246A
Authority
CN
China
Prior art keywords
image
model
light field
network
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910276356.0A
Other languages
Chinese (zh)
Other versions
CN110163246B (en
Inventor
戴国骏
刘高敏
张桦
周文晖
陶星
戴美想
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Xinmai Microelectronics Co ltd
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201910276356.0A priority Critical patent/CN110163246B/en
Publication of CN110163246A publication Critical patent/CN110163246A/en
Application granted granted Critical
Publication of CN110163246B publication Critical patent/CN110163246B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a kind of unsupervised depth estimation method of monocular light field image based on convolutional neural networks.The present invention, as training set, makes training set sample tend to balance first with disclosed large-scale light field image data set by data enhancing, data extending.Construct improved ResNet50 network model, extract the advanced and rudimentary feature of model respectively using encoder and decoder, the result of encoder and decoder is merged by intensive poor structure again, a super-resolution occlusion detection network is in addition constructed simultaneously, deep learning is able to use and accurately predicts occlusion issue between each visual angle;Objective function based on light field image estimation of Depth task is more loss functions, is trained by the network model pre-defined to pretreated image, finally carries out extensive assessment to network model on test set.The present invention is significant to the light field image pretreating effect of complex scene, realizes the effect of the unsupervised estimation of Depth of more accurate light field image.

Description

Monocular light field image unsupervised depth estimation method based on convolutional neural network
Technical Field
The invention relates to the field of light field image processing, in particular to a monocular light field image unsupervised depth estimation method based on a convolutional neural network.
Background
In recent years, the single image depth estimation based on supervised learning is rapidly developed, but the accurate label to be acquired by the supervised learning is very difficult and is influenced by a plurality of external factors such as environment, illumination and the like, and the influence needs to be overcome at a huge cost. Based on the characteristic of the image depth, the invention discloses a monocular unsupervised depth estimation method based on a depth convolution neural network, and the depth information of the image can be quickly and accurately estimated. Because of the unsupervised estimation, no special making of depth information tags is required, which can greatly reduce the up-front workload and cost of depth estimation.
Disclosure of Invention
The invention aims to solve the problem of the monocular supervised depth estimation data set label, and provides a monocular light field image unsupervised depth estimation method based on a convolutional neural network.
The method considers the influence of the light field image shielding problem of different visual angles (3 multiplied by 3 squared squares) on the depth estimation consistency, enhances the data, constructs a convolutional neural network and provides a loss function of the network suitable for the light field image, realizes the accurate discrete mapping from the image to the depth, and ensures that the image depth estimation result is more accurate, rapid and efficient.
In order to achieve the purpose, the invention provides the following technical scheme which comprises the following main steps:
1. the method for unsupervised depth estimation of the monocular light field image based on the convolutional neural network is characterized by comprising the following steps of:
step 1, data preprocessing:
the experimental data set is based on a light field image data set which is obtained by shooting a real object in the real world by a Lytroillum light field camera disclosed by Stanford;
the data preprocessing comprises image brightness enhancement, horizontal/vertical turning and random shearing;
after data preprocessing, the light field image data set is further expanded, and the diversity of training samples and testing samples is increased;
step 2, constructing models including a convolutional neural network depth estimation model and a convolutional neural network occlusion detection model;
the depth estimation model of the convolutional neural network is specifically realized as follows:
taking a ResNet50 network model as an encoder Eecode, and improving an original network by using self-adaptive normalization on the basis of ResNet50 to adapt to the use of a light field image; the encoder gradually compresses the length and width of the image and increases the number of features, and the original input image is set as I256*256*3The subscript indicates the length, width and channel number of the image, and the intermediate result change process of the step-by-step encoding by the encoder is as follows:
IE 256*256*64→IE 128*128*128→IE 64*64*256→IE 32*32*512→IE 16*16*1024
the decoder just reverses the method, and the length and the width of the characteristic diagram of the result of the encoder are restored to the size of the original image step by step; uses a dense residual structure to connect two processes of Decode and Encode, i.e. IE 32*32*512And ID 32*32*512Connected together through a jump layer;
considering that the parallax range of a camera with a light field is in an interval of [ -4,4], extracting a predicted parallax map by adopting a Tanh activation function, and multiplying the range of Tanh by 4 times on the basis of the obtained parallax map to obtain a real parallax map as the range of Tanh is between [ -1,1 ]; acquiring a disparity map by adopting a 4-layer pyramid structure, so that a disparity map fusion result with 4 different scales is finally obtained by the network;
the convolutional neural network occlusion detection model is used for learning occlusion relations among different visual angles, meanwhile, a plurality of loss functions are used for constraint training, the problem of image occlusion and the problem of consistency of depth estimation are solved, and self-adaptive regularization is used in each layer of structure; the network is composed of 8 layers of full convolution layers, wherein 1 to 3 layers are used for extracting features by an encoder, 4 to 6 layers are used for recovering images by a decoder according to the features, 7 th layer is used for carrying out deconvolution operation for obtaining super-resolution images, and the last layer is used for obtaining the size of an original image by down-sampling;
step 3, in order to optimize the quality of the network model estimation disparity map, estimating images of other visual angles of an original input image through estimated disparity map bilinear interpolation Warping, and constraining a composite map of the images of other visual angles through a loss function;
step 4, setting an optimizer, dynamically optimizing and adjusting the learning rate, dynamically setting an ideal learning rate for the model, setting the initial learning rate to be 0.0001, and slowing down the learning rate along with the increase of the number of batches in the model training process, wherein the slowing down mechanism is as follows: training and parameter solving are carried out on the model by using a momentum-based random gradient descent type network optimization algorithm, a momentum factor mu is dynamically adjusted along with the fluctuation of loss, the initial value of mu is set to be 0.5, when the loss fluctuation is reduced, the network is considered to be relatively stable, and the corresponding mu is reduced, so that the effect of dynamically adjusting the learning rate and refining the training process is achieved;
step 5, training a convolutional neural network:
firstly, selecting 60% of data samples in the data set in the step 1 as a training sample set, and setting a random value to determine that the training set obtained each time is a disordered and uniformly distributed sample;
secondly, defining a loss function and an optimizer, adjusting network parameters and counting indexes;
finally, the network model in the step 2 is used as a training model to train the data sample, and the model is stored after the training is finished, so that the model can be conveniently and rapidly loaded at the later stage;
step 8, testing the convolutional neural network: and evaluating by using the PSNR and the SSIM, wherein the two indexes are indexes for quantizing the image quality accuracy and are used for presenting the quality quantization effect of the synthesized image, and the accuracy index is used for measuring the estimation effect of the model by comparing the data predicted by the model with the test data to finally obtain the accuracy of the depth estimation of the model on the test set.
The loss functions in the convolutional neural network are set to be 3, and are specifically defined as follows:
the first loss function I is the image consistency constraint Limage_lossEnabling the estimated image and the original image to be as close as possible, and also having image quality constraint, which requires that the estimated image and the original image have consistent similarity on local parts;
the second loss function II is the consistency constraint L of the disparity mapconsistThe problem of the consistency of the parallax map and the problem of parallax occlusion are solved;
the third loss function III is a disparity map smoothness constraint loss function LSmoothThe presence of some outliers in the estimated disparity map is prevented from leading to a final underperforming problem.
The loss function I is specified as follows:
the loss function I is used for measuring the difference between the estimated image and the original image, the L1 distance is used for comparison, the image quality is detected by using SSIM, if the estimated image is more similar to the original image, the value of SSIM is closer to 1, and the loss function I is expressed as follows:
wherein,representing the original image from view (i, j),the first term of the formula is used for detecting the quality and local similarity of the prediction graph, and the second term of the formula is used for detecting the distance between the prediction graph and the original image, namely the similarity of pixel values of pixel points by pixel points;
the loss function II is specifically defined as follows:
wherein D isi+x,j+yDenotes a disparity map at (i + x, j + y) viewing angle, Di,j+Dx,yRepresenting the disparity map at the (i + x, j + y) view angle obtained by the disparity map at the (i, j) view angle through an (x, y) vector Warping;
the loss function III is defined as follows:
wherein,respectively representing partial derivatives of the abscissa and ordinate of the disparity map at (i, j) disparity,representing the abscissa and ordinate, respectively, of the original at (i, j) parallaxA partial derivative;
the final overall loss function is as follows:
Ltotle=Limage_loss+Lconsist+LSmooth
by defining the depth estimation of the light field image by the multi-loss function, the result can be optimized from different aspects, so that the result is more accurate.
The step 1 is data enhancement, after some enhancement operations are performed on original data, the network has stronger robustness and network overfitting can be prevented, three methods of random inversion, color enhancement and random shearing are mainly used, and an original image I is assumed to be composed of 9 pixel blocks and is represented as follows:
the random turning has two types of vertical turning and horizontal turning, and the images obtained after turning are respectively I1And I2Then, I1、I2Is represented as follows:
the random color enhancement means that an enhanced coefficient is firstly randomly generated, the enhanced coefficient can be for an RGB single color channel or directly for three same channels, and the enhanced coefficient for the three channels is α, so that the enhanced image I3Is represented as follows:
the random cropping is to change the pixel value of a certain area or several areas in the original image to 0 or other values so as to changeAn example I of random clipping that the semanteme of the transformed image is disambiguated and discontinuous in some regions4As follows:
although only 3 enhancing methods are used, the data samples obtained by combining the enhancing methods are multiple times of the original samples, and the enhanced data are used for training the model, so that the model has stronger robustness and generalization capability, and the prediction accuracy of the model is further improved.
The invention has the following beneficial effects:
the invention relates to a monocular light field image unsupervised depth estimation based on a convolutional neural network, which does not need to specially manufacture a label of a data set due to unsupervised estimation, so that the depth estimation is more convenient, and meanwhile, the model is restrained by using a multi-loss function, so that the model has high prediction precision. The improved ResNet50 model has good generalization performance, a convolutional neural network model framework with deeper depth is used, the performance is good, the robustness is stronger due to a dense poor structure, the learning process can be stabilized through case regularization, the model convergence rate is effectively improved, the problems of occlusion and boundary blurring and occlusion are effectively solved by the super-resolution occlusion detection network, and the target function combines multiple loss functions to serve as a network model optimizer. By properly adopting some training skills and selecting ideal network parameters, an optimization algorithm and the setting of the learning rate, the network is more stable, the result is more reliable, and the unsupervised depth estimation accuracy of the light field image is greatly improved.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The invention is further illustrated by the following figures and examples.
As shown in fig. 1, the method for unsupervised depth estimation of monocular light field images based on convolutional neural network specifically includes the following steps:
step 1, the experimental data set is based on a light field image data set which is disclosed by Stanford and obtained by shooting real objects in the real world by a Lytroillum light field camera, and the data set comprises a large number of plants, flowers, street views, sculpture images and the like. The images are preprocessed and enhanced, the enhancement methods mainly used in the invention comprise image brightness enhancement, horizontal/vertical turning, random shearing and the like, after the images are enhanced, the data set is further expanded, the diversity of training samples and testing samples is increased, the robustness of a network model is further enhanced, and the generalization capability of the model is stronger so as not to generate an overfitting phenomenon. On the other hand, the model performance is improved to a certain extent.
After some enhancement operations are performed on original data, the network can have stronger robustness and prevent the network from overfitting, three methods of random inversion, color enhancement and random shearing are mainly used, and an original image I is assumed to be composed of 9 pixel blocks and is represented as follows:
the random turning has two types of vertical turning and horizontal turning, and the images obtained after turning are respectively I1And I2Then, I1、I2Is represented as follows:
random color enhancement means that first a gain is randomly generatedThe strong coefficients, either for the RGB single color channel or directly for the three co-channels, assume that the coefficients enhanced for the three channels are α, and the enhanced image I3Is represented as follows:
the random cropping is an example I of randomly cropping in which the pixel value of a certain region or regions in the original image is changed to 0 or other values so that the semantics of the changed image in some regions are ambiguous and discontinuous4As follows:
although only 3 enhancing methods are used, the data samples obtained by combining the enhancing methods are multiple times of the original samples, and the enhanced data are used for training the model, so that the model has stronger robustness and generalization capability, and the prediction accuracy of the model is further improved.
And 2, constructing an unsupervised depth estimation method of the convolutional neural network based on ResNet50, and performing foreground and background feature region identification and feature segmentation by using an encoder Encode and a decoder Decode network, so that the extracted feature region realizes higher segmentation accuracy, and the efficiency and accuracy of deep learning of image features by the convolutional neural network are improved. And (3) fusing the low-level features and the high-level features learned by the network together by using a pyramid model with a 4-layer structure and a residual structure, and increasing the information learned by the network.
And 3, after processing of the network model, wherein the occlusion is always the largest factor influencing the depth estimation precision, in order to enable better occlusion information to be learned, a super-resolution convolutional neural network model for occlusion learning is constructed, occlusion relations between different visual angles are learned, besides, a plurality of loss functions are used for constraint training, the problem of image occlusion and the problem of consistency of depth estimation are solved, self-adaptive regularization is used in each layer structure, the overfitting phenomenon is avoided, and the generalization capability of the network and the depth estimation precision are also improved.
And 4, in order to test the quality of the estimated disparity map of the network model, estimating images of other visual angles of the estimated original central image of the disparity map by a bilinear interpolation method, and optimizing the estimated image and the original image.
Step 5, defining a network model loss function, wherein in order to better guide the network to train, a special loss function is defined for the light field image for constraint, 3 loss functions are mainly used for constraint, the first one is image consistency constraint, so that an estimated image and an original image can be as close as possible, and image quality constraint is also provided, which requires that the similarity of the estimated image and the original image on local parts is consistent; secondly, the consistency constraint of the disparity map is carried out, and the problem of disparity map consistency is solved; the third is the disparity map smoothness constraint, which prevents some outliers existing in the estimated disparity map from causing the final result to be poor in accuracy. These loss functions need to be redefined to enable adaptation to the light-field image.
Because the loss function of the network is used for guiding network optimization and measuring the error between the predicted value and the real sample mark, the quality of the loss function is directly related to the quality of the final result of the network, and 3 special loss functions are designed for guiding network training optimization.
Loss function 1. image consistency constraint, which measures the difference between the estimated image and the original image, the L1 distance used here is compared, the image quality is detected using SSIM, the value of SSIM is closer to 1 if the estimated image is more similar to the original image, and the loss function is expressed as follows:
in the above-mentioned formula, the first and second groups,representing the original image from view (i, j),shown are the estimated images of view (i, j), α, β、ΨIs a super ginseng. The former term is used for detecting the quality and local similarity of the prediction image, and the latter term is used for detecting the distance between the prediction image and the original image, namely the similarity of pixel values pixel by pixel.
The invention trains a network specially used for occlusion detection to predict the occluded part, and also defines a loss function to constrain the consistency between disparity maps, wherein the loss function is defined as follows:
in the above formula Di+x,j+yDenotes a disparity map at (i + x, j + y) viewing angle, Di,j+Dx,yThe disparity map at the (i + x, j + y) view angle is obtained from the disparity map at the point (i, j) through the (x, y) vector Warping, and if the disparity estimation is correct, the two terms should be equal, that is, the two terms are consistent.
And 3, defining a parallax smoothness loss function to restrict the parallax smoothness constraint in order to eliminate the influence of some abnormal values in the predicted parallax map on the result, wherein the loss function is defined as follows:
in the above-mentioned formula, the first and second groups,respectively representing partial derivatives of the abscissa and ordinate of the disparity map at (i, j),the partial derivatives of the original at (i, j) are indicated on the abscissa and ordinate, respectively. That is, the greater the deviation or gradient of the original image, the smaller the deviation coefficient of the disparity map and the smoother the disparity map, so the final total loss function of the present invention is as follows:
Ltotle=Limage_loss+Lconsist+LSmooth
by defining the depth estimation of the light field image by the multi-loss function, the result can be optimized from different aspects, so that the result is more accurate.
And 6, dynamically optimizing and adjusting the learning rate, dynamically setting an ideal learning rate for the model, setting the initial learning rate to be 0.0001, and slowing down the learning rate along with the increase of the number of batches in the model training process, wherein the slowing down mechanism is as follows: if the loss stops decreasing within two or more training batches, the learning rate is decreased toTraining and parameter solving are carried out on the model by utilizing a momentum-based random gradient descent type network optimization algorithm, a momentum factor mu is set to be dynamically adjusted along with the fluctuation of loss, the initial value of mu is set to be 0.5, when the loss fluctuation is reduced, the network is considered to be basically stable, the corresponding mu is reduced, and therefore the dynamic regulation learning rate is achievedThe effect of the training process is refined, and the network can be helped to jump out of local limitation when the network tends to converge in the middle and later stages of network training and the network parameters oscillate back and forth near the local minimum value, so that better network parameters can be found.
And 7, when the network training module trains the convolutional neural network, firstly, 60% of data samples in the data set in the step 1 are selected as a training sample set, and a random value is set to determine that the training set obtained each time is a sample which is unordered and uniformly distributed. And (5) defining a loss function in the step (4) and an optimizer in the step (5), adjusting network parameters and counting indexes. And (3) taking the network model in the step (2) as a training model to train the data sample, and storing the model after the training is finished so as to facilitate the loading of the model at a later stage.
And 8, the network test module evaluates by using the PSNR and the SSIM, wherein the two indexes are indexes for quantifying the image quality accuracy rate and are used for presenting the visual effect of the quality of the synthesized image, the estimation effect of the model is measured by comparing the data predicted by the model with the test data and using the accuracy rate index, and finally the accuracy rate of the estimated depth of the model on the test set is obtained.

Claims (3)

1. The method for unsupervised depth estimation of the monocular light field image based on the convolutional neural network is characterized by comprising the following steps of:
step 1, data preprocessing:
the experimental data set is based on a light field image data set which is obtained by shooting a real object in the real world by a Lytroillum light field camera disclosed by Stanford;
the data preprocessing comprises image brightness enhancement, horizontal/vertical turning and random shearing;
after data preprocessing, the light field image data set is further expanded, and the diversity of training samples and testing samples is increased;
step 2, constructing models including a convolutional neural network depth estimation model and a convolutional neural network occlusion detection model;
the depth estimation model of the convolutional neural network is specifically realized as follows:
taking a ResNet50 network model as an encoder Eecode, and improving an original network by using self-adaptive normalization on the basis of ResNet50 to adapt to the use of a light field image; the encoder gradually compresses the length and width of the image and increases the number of features, and the original input image is set as I256*256*3The subscript indicates the length, width and channel number of the image, and the intermediate result change process of the step-by-step encoding by the encoder is as follows:
IE 256*256*64→IE 128*128*128→IE 64*64*256→IE 32*32*512→IE 16*16*1024
the decoder just reverses the method, and the length and the width of the characteristic diagram of the result of the encoder are restored to the size of the original image step by step; uses a dense residual structure to connect two processes of Decode and Encode, i.e. IE 32*32*512And ID 32*32*512Connected together through a jump layer;
considering that the parallax range of a camera with a light field is in an interval of [ -4,4], extracting a predicted parallax map by adopting a Tanh activation function, and multiplying the range of Tanh by 4 times on the basis of the obtained parallax map to obtain a real parallax map as the range of Tanh is between [ -1,1 ]; acquiring a disparity map by adopting a 4-layer pyramid structure, so that a disparity map fusion result with 4 different scales is finally obtained by the network;
the convolutional neural network occlusion detection model is used for learning occlusion relations among different visual angles, meanwhile, a plurality of loss functions are used for constraint training, the problem of image occlusion and the problem of consistency of depth estimation are solved, and self-adaptive regularization is used in each layer of structure; the network is composed of 8 layers of full convolution layers, wherein 1 to 3 layers are used for extracting features by an encoder, 4 to 6 layers are used for recovering images by a decoder according to the features, 7 th layer is used for carrying out deconvolution operation for obtaining super-resolution images, and the last layer is used for obtaining the size of an original image by down-sampling;
step 3, in order to optimize the quality of the network model estimation disparity map, estimating images of other visual angles of an original input image through estimated disparity map bilinear interpolation Warping, and constraining a composite map of the images of other visual angles through a loss function;
step 4, setting an optimizer, dynamically optimizing and adjusting the learning rate, dynamically setting an ideal learning rate for the model, setting the initial learning rate to be 0.0001, and slowing down the learning rate along with the increase of the number of batches in the model training process, wherein the slowing down mechanism is as follows: training and parameter solving are carried out on the model by using a momentum-based random gradient descent type network optimization algorithm, a momentum factor mu is dynamically adjusted along with the fluctuation of loss, the initial value of mu is set to be 0.5, when the loss fluctuation is reduced, the network is considered to be relatively stable, and the corresponding mu is reduced, so that the effect of dynamically adjusting the learning rate and refining the training process is achieved;
step 5, training a convolutional neural network:
firstly, selecting 60% of data samples in the data set in the step 1 as a training sample set, and setting a random value to determine that the training set obtained each time is a disordered and uniformly distributed sample;
secondly, defining a loss function and an optimizer, adjusting network parameters and counting indexes;
finally, the network model in the step 2 is used as a training model to train the data sample, and the model is stored after the training is finished, so that the model can be conveniently and rapidly loaded at the later stage;
step 8, testing the convolutional neural network: and evaluating by using the PSNR and the SSIM, wherein the two indexes are indexes for quantizing the image quality accuracy and are used for presenting the quality quantization effect of the synthesized image, and the accuracy index is used for measuring the estimation effect of the model by comparing the data predicted by the model with the test data to finally obtain the accuracy of the depth estimation of the model on the test set.
2. The unsupervised depth estimation method for monocular light field images based on convolutional neural network as claimed in claim 1, wherein there are 3 loss functions in the convolutional neural network, specifically defined as follows:
the first loss function I is the image consistency constraint Limage_lossEnabling the estimated image and the original image to be as close as possible, and also having image quality constraint, which requires that the estimated image and the original image have consistent similarity on local parts;
the second loss function II is the disparity map consistency constraint LconsistThe problem of the consistency of the parallax map and the problem of parallax occlusion are solved;
the third loss function III is a disparity map smoothness constraint loss function LSmoothThe presence of some outliers in the estimated disparity map is prevented from leading to a final underperforming problem.
3. The unsupervised depth estimation method for monocular light field images based on convolutional neural network as claimed in claim 2, wherein the loss function i is specifically as follows:
the loss function i is used for measuring the difference between the estimated image and the original image, the L1 distance is used for comparison, the image quality is detected by using SSIM, if the estimated image is more similar to the original image, the value of SSIM is closer to 1, and the loss function i is expressed as follows:
wherein,representing the original image from view (i, j),showing the estimated image of the view angle (i, j), α, β, psi all representing the hyper-parameter, the first term of the formula being to detect the quality of the prediction mapLocal similarity, wherein the second term of the formula is used for detecting the distance between the prediction image and the original image, namely the similarity of pixel values of pixel points by pixel points;
the loss function ii is specifically defined as follows:
wherein D isi+x,j+yRepresenting a disparity map at (i + x, j + y) view,representing the disparity map at the (i + x, j + y) view angle obtained by the disparity map at the (i, j) view angle through an (x, y) vector Warping;
the loss function iii is defined as follows:
wherein,respectively representing partial derivatives of the abscissa and ordinate of the disparity map at (i, j) disparity,respectively representing partial derivatives of the abscissa and ordinate of the original image at the (i, j) parallax;
the final overall loss function is as follows:
Ltotle=Limage_loss+Lconsist+LSmooth
CN201910276356.0A 2019-04-08 2019-04-08 Monocular light field image unsupervised depth estimation method based on convolutional neural network Active CN110163246B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910276356.0A CN110163246B (en) 2019-04-08 2019-04-08 Monocular light field image unsupervised depth estimation method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910276356.0A CN110163246B (en) 2019-04-08 2019-04-08 Monocular light field image unsupervised depth estimation method based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN110163246A true CN110163246A (en) 2019-08-23
CN110163246B CN110163246B (en) 2021-03-30

Family

ID=67638504

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910276356.0A Active CN110163246B (en) 2019-04-08 2019-04-08 Monocular light field image unsupervised depth estimation method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN110163246B (en)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503082A (en) * 2019-08-30 2019-11-26 腾讯科技(深圳)有限公司 A kind of model training method and relevant apparatus based on deep learning
CN110689060A (en) * 2019-09-16 2020-01-14 西安电子科技大学 Heterogeneous image matching method based on aggregation feature difference learning network
CN110942484A (en) * 2019-11-26 2020-03-31 福州大学 Camera self-motion estimation method based on occlusion perception and feature pyramid matching
CN110956202A (en) * 2019-11-13 2020-04-03 重庆大学 Image training method, system, medium and intelligent device based on distributed learning
CN111047630A (en) * 2019-11-13 2020-04-21 芯启源(上海)半导体科技有限公司 Neural network and target detection and depth prediction method based on neural network
CN111222481A (en) * 2020-01-14 2020-06-02 武汉船舶通信研究所(中国船舶重工集团公司第七二二研究所) Method and device for identifying clothes color
CN111242131A (en) * 2020-01-06 2020-06-05 北京十六进制科技有限公司 Method, storage medium and device for image recognition in intelligent marking
CN111325782A (en) * 2020-02-18 2020-06-23 南京航空航天大学 Unsupervised monocular view depth estimation method based on multi-scale unification
CN111476835A (en) * 2020-05-21 2020-07-31 中国科学院自动化研究所 Unsupervised depth prediction method, system and device for consistency of multi-view images
CN111597524A (en) * 2020-05-22 2020-08-28 江苏濠汉信息技术有限公司 Verification method and system for seal sample sampling personnel
CN111833390A (en) * 2020-06-23 2020-10-27 杭州电子科技大学 Light field depth estimation method based on unsupervised depth learning
CN111899295A (en) * 2020-06-06 2020-11-06 东南大学 Monocular scene depth prediction method based on deep learning
CN112215303A (en) * 2020-11-05 2021-01-12 北京理工大学 Image understanding method and system based on self-learning attribute
CN112270692A (en) * 2020-10-15 2021-01-26 电子科技大学 Monocular video structure and motion prediction self-supervision method based on super-resolution
CN112288789A (en) * 2020-10-26 2021-01-29 杭州电子科技大学 Light field depth self-supervision learning method based on occlusion region iterative optimization
CN112330724A (en) * 2020-10-15 2021-02-05 贵州大学 Unsupervised multi-modal image registration method based on integrated attention enhancement
CN112396645A (en) * 2020-11-06 2021-02-23 华中科技大学 Monocular image depth estimation method and system based on convolution residual learning
CN112435198A (en) * 2020-12-03 2021-03-02 西安交通大学 Welding seam radiographic inspection negative image enhancement method, storage medium and equipment
CN112488033A (en) * 2020-12-10 2021-03-12 北京金山云网络技术有限公司 Data set construction method and device, electronic equipment and storage medium
CN112561979A (en) * 2020-12-25 2021-03-26 天津大学 Self-supervision monocular depth estimation method based on deep learning
CN112561818A (en) * 2020-12-14 2021-03-26 英特灵达信息技术(深圳)有限公司 Image enhancement method and device, electronic equipment and storage medium
CN112633052A (en) * 2020-09-15 2021-04-09 北京华电天仁电力控制技术有限公司 Belt tearing detection method
CN112734864A (en) * 2019-10-28 2021-04-30 天津大学青岛海洋技术研究院 Three-way convolution neural network structure for coloring gray level image
CN112785637A (en) * 2021-01-20 2021-05-11 大连理工大学 Light field depth estimation method based on dynamic fusion network
CN112819742A (en) * 2021-02-05 2021-05-18 武汉大学 Event field synthetic aperture imaging method based on convolutional neural network
CN113139553A (en) * 2020-01-16 2021-07-20 中国科学院国家空间科学中心 U-net-based method and system for extracting aurora ovum form of ultraviolet aurora image
CN113361378A (en) * 2021-06-02 2021-09-07 合肥工业大学 Human body posture estimation method using adaptive data enhancement
CN113393510A (en) * 2020-03-12 2021-09-14 武汉Tcl集团工业研究院有限公司 Image processing method, intelligent terminal and storage medium
CN113410861A (en) * 2020-03-17 2021-09-17 内蒙古电力(集团)有限责任公司内蒙古电力科学研究院分公司 Droop control parameter optimization method suitable for multi-terminal flexible direct current system
CN113516141A (en) * 2020-11-06 2021-10-19 阿里巴巴集团控股有限公司 Method and device for optimizing depth measurement model and storage medium
CN113538575A (en) * 2020-04-20 2021-10-22 辉达公司 Distance determination using one or more neural networks
CN113592913A (en) * 2021-08-09 2021-11-02 中国科学院深圳先进技术研究院 Method for eliminating uncertainty of self-supervision three-dimensional reconstruction
CN116127844A (en) * 2023-02-08 2023-05-16 大连海事大学 Flow field time interval deep learning prediction method considering flow control equation constraint

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070122028A1 (en) * 2005-11-30 2007-05-31 Microsoft Corporation Symmetric stereo model for handling occlusion
CN102903096A (en) * 2012-07-04 2013-01-30 北京航空航天大学 Monocular video based object depth extraction method
US20140153784A1 (en) * 2012-10-18 2014-06-05 Thomson Licensing Spatio-temporal confidence maps
CN106612427A (en) * 2016-12-29 2017-05-03 浙江工商大学 Method for generating spatial-temporal consistency depth map sequence based on convolution neural network
CN107204010A (en) * 2017-04-28 2017-09-26 中国科学院计算技术研究所 A kind of monocular image depth estimation method and system
CN107993260A (en) * 2017-12-14 2018-05-04 浙江工商大学 A kind of light field image depth estimation method based on mixed type convolutional neural networks
CN108596965A (en) * 2018-03-16 2018-09-28 天津大学 A kind of light field image depth estimation method
CN108846473A (en) * 2018-04-10 2018-11-20 杭州电子科技大学 Light field depth estimation method based on direction and dimension self-adaption convolutional neural networks
CN108961327A (en) * 2018-05-22 2018-12-07 深圳市商汤科技有限公司 A kind of monocular depth estimation method and its device, equipment and storage medium
CN109191515A (en) * 2018-07-25 2019-01-11 北京市商汤科技开发有限公司 A kind of image parallactic estimation method and device, storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070122028A1 (en) * 2005-11-30 2007-05-31 Microsoft Corporation Symmetric stereo model for handling occlusion
CN102903096A (en) * 2012-07-04 2013-01-30 北京航空航天大学 Monocular video based object depth extraction method
US20140153784A1 (en) * 2012-10-18 2014-06-05 Thomson Licensing Spatio-temporal confidence maps
CN106612427A (en) * 2016-12-29 2017-05-03 浙江工商大学 Method for generating spatial-temporal consistency depth map sequence based on convolution neural network
CN107204010A (en) * 2017-04-28 2017-09-26 中国科学院计算技术研究所 A kind of monocular image depth estimation method and system
CN107993260A (en) * 2017-12-14 2018-05-04 浙江工商大学 A kind of light field image depth estimation method based on mixed type convolutional neural networks
CN108596965A (en) * 2018-03-16 2018-09-28 天津大学 A kind of light field image depth estimation method
CN108846473A (en) * 2018-04-10 2018-11-20 杭州电子科技大学 Light field depth estimation method based on direction and dimension self-adaption convolutional neural networks
CN108961327A (en) * 2018-05-22 2018-12-07 深圳市商汤科技有限公司 A kind of monocular depth estimation method and its device, equipment and storage medium
CN109191515A (en) * 2018-07-25 2019-01-11 北京市商汤科技开发有限公司 A kind of image parallactic estimation method and device, storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
TING-CHUN WANG 等: ""Depth Estima tion with Occlusion Modeling Using Light-Field Cameras"", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
XIAORAN JIANG 等: ""Depth Estimation with Occlusion Handling from a Sparse Set of Light Field Views"", 《2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)》 *
YAOXIANG LUO 等: ""EPI-Patch Based Convolutional Neural Network for Depth Estimation on 4D Light Field"", 《NEURAL INFORMATION PROCESSING》 *
李鹏飞: ""基于深度线索和遮挡检测的光场相机深度估计研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
罗姚翔: ""基于卷积神经网络的光场图像深度估计技术研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503082B (en) * 2019-08-30 2024-03-12 腾讯科技(深圳)有限公司 Model training method based on deep learning and related device
CN110503082A (en) * 2019-08-30 2019-11-26 腾讯科技(深圳)有限公司 A kind of model training method and relevant apparatus based on deep learning
CN110689060A (en) * 2019-09-16 2020-01-14 西安电子科技大学 Heterogeneous image matching method based on aggregation feature difference learning network
CN110689060B (en) * 2019-09-16 2022-01-28 西安电子科技大学 Heterogeneous image matching method based on aggregation feature difference learning network
CN112734864A (en) * 2019-10-28 2021-04-30 天津大学青岛海洋技术研究院 Three-way convolution neural network structure for coloring gray level image
CN110956202A (en) * 2019-11-13 2020-04-03 重庆大学 Image training method, system, medium and intelligent device based on distributed learning
CN111047630A (en) * 2019-11-13 2020-04-21 芯启源(上海)半导体科技有限公司 Neural network and target detection and depth prediction method based on neural network
CN111047630B (en) * 2019-11-13 2023-06-13 芯启源(上海)半导体科技有限公司 Neural network and target detection and depth prediction method based on neural network
CN110942484A (en) * 2019-11-26 2020-03-31 福州大学 Camera self-motion estimation method based on occlusion perception and feature pyramid matching
CN110942484B (en) * 2019-11-26 2022-07-12 福州大学 Camera self-motion estimation method based on occlusion perception and feature pyramid matching
CN111242131A (en) * 2020-01-06 2020-06-05 北京十六进制科技有限公司 Method, storage medium and device for image recognition in intelligent marking
CN111242131B (en) * 2020-01-06 2024-05-10 北京十六进制科技有限公司 Method, storage medium and device for identifying images in intelligent paper reading
CN111222481A (en) * 2020-01-14 2020-06-02 武汉船舶通信研究所(中国船舶重工集团公司第七二二研究所) Method and device for identifying clothes color
CN111222481B (en) * 2020-01-14 2022-09-09 武汉船舶通信研究所(中国船舶重工集团公司第七二二研究所) Method and device for identifying clothes color
CN113139553A (en) * 2020-01-16 2021-07-20 中国科学院国家空间科学中心 U-net-based method and system for extracting aurora ovum form of ultraviolet aurora image
CN111325782A (en) * 2020-02-18 2020-06-23 南京航空航天大学 Unsupervised monocular view depth estimation method based on multi-scale unification
CN113393510A (en) * 2020-03-12 2021-09-14 武汉Tcl集团工业研究院有限公司 Image processing method, intelligent terminal and storage medium
CN113410861A (en) * 2020-03-17 2021-09-17 内蒙古电力(集团)有限责任公司内蒙古电力科学研究院分公司 Droop control parameter optimization method suitable for multi-terminal flexible direct current system
CN113538575A (en) * 2020-04-20 2021-10-22 辉达公司 Distance determination using one or more neural networks
CN111476835A (en) * 2020-05-21 2020-07-31 中国科学院自动化研究所 Unsupervised depth prediction method, system and device for consistency of multi-view images
CN111476835B (en) * 2020-05-21 2021-08-10 中国科学院自动化研究所 Unsupervised depth prediction method, system and device for consistency of multi-view images
CN111597524B (en) * 2020-05-22 2021-03-23 江苏濠汉信息技术有限公司 Verification method and system for seal sample sampling personnel
CN111597524A (en) * 2020-05-22 2020-08-28 江苏濠汉信息技术有限公司 Verification method and system for seal sample sampling personnel
CN111899295A (en) * 2020-06-06 2020-11-06 东南大学 Monocular scene depth prediction method based on deep learning
CN111833390A (en) * 2020-06-23 2020-10-27 杭州电子科技大学 Light field depth estimation method based on unsupervised depth learning
CN112633052A (en) * 2020-09-15 2021-04-09 北京华电天仁电力控制技术有限公司 Belt tearing detection method
CN112270692B (en) * 2020-10-15 2022-07-05 电子科技大学 Monocular video structure and motion prediction self-supervision method based on super-resolution
CN112270692A (en) * 2020-10-15 2021-01-26 电子科技大学 Monocular video structure and motion prediction self-supervision method based on super-resolution
CN112330724A (en) * 2020-10-15 2021-02-05 贵州大学 Unsupervised multi-modal image registration method based on integrated attention enhancement
CN112330724B (en) * 2020-10-15 2024-04-09 贵州大学 Integrated attention enhancement-based unsupervised multi-modal image registration method
CN112288789B (en) * 2020-10-26 2024-03-29 杭州电子科技大学 Light field depth self-supervision learning method based on iterative optimization of shielding region
CN112288789A (en) * 2020-10-26 2021-01-29 杭州电子科技大学 Light field depth self-supervision learning method based on occlusion region iterative optimization
CN112215303B (en) * 2020-11-05 2022-02-11 北京理工大学 Image understanding method and system based on self-learning attribute
CN112215303A (en) * 2020-11-05 2021-01-12 北京理工大学 Image understanding method and system based on self-learning attribute
CN113516141A (en) * 2020-11-06 2021-10-19 阿里巴巴集团控股有限公司 Method and device for optimizing depth measurement model and storage medium
CN113516141B (en) * 2020-11-06 2024-03-01 阿里巴巴集团控股有限公司 Optimization method, equipment and storage medium of depth measurement model
CN112396645B (en) * 2020-11-06 2022-05-31 华中科技大学 Monocular image depth estimation method and system based on convolution residual learning
CN112396645A (en) * 2020-11-06 2021-02-23 华中科技大学 Monocular image depth estimation method and system based on convolution residual learning
CN112435198A (en) * 2020-12-03 2021-03-02 西安交通大学 Welding seam radiographic inspection negative image enhancement method, storage medium and equipment
CN112488033A (en) * 2020-12-10 2021-03-12 北京金山云网络技术有限公司 Data set construction method and device, electronic equipment and storage medium
CN112488033B (en) * 2020-12-10 2024-10-18 北京金山云网络技术有限公司 Data set construction method and device, electronic equipment and storage medium
CN112561818B (en) * 2020-12-14 2024-05-28 英特灵达信息技术(深圳)有限公司 Image enhancement method and device, electronic equipment and storage medium
CN112561818A (en) * 2020-12-14 2021-03-26 英特灵达信息技术(深圳)有限公司 Image enhancement method and device, electronic equipment and storage medium
CN112561979A (en) * 2020-12-25 2021-03-26 天津大学 Self-supervision monocular depth estimation method based on deep learning
CN112561979B (en) * 2020-12-25 2022-06-28 天津大学 Self-supervision monocular depth estimation method based on deep learning
CN112785637B (en) * 2021-01-20 2022-10-11 大连理工大学 Light field depth estimation method based on dynamic fusion network
CN112785637A (en) * 2021-01-20 2021-05-11 大连理工大学 Light field depth estimation method based on dynamic fusion network
CN112819742B (en) * 2021-02-05 2022-05-13 武汉大学 Event field synthetic aperture imaging method based on convolutional neural network
CN112819742A (en) * 2021-02-05 2021-05-18 武汉大学 Event field synthetic aperture imaging method based on convolutional neural network
CN113361378A (en) * 2021-06-02 2021-09-07 合肥工业大学 Human body posture estimation method using adaptive data enhancement
CN113592913A (en) * 2021-08-09 2021-11-02 中国科学院深圳先进技术研究院 Method for eliminating uncertainty of self-supervision three-dimensional reconstruction
CN113592913B (en) * 2021-08-09 2023-12-26 中国科学院深圳先进技术研究院 Method for eliminating uncertainty of self-supervision three-dimensional reconstruction
CN116127844B (en) * 2023-02-08 2023-10-31 大连海事大学 Flow field time interval deep learning prediction method considering flow control equation constraint
CN116127844A (en) * 2023-02-08 2023-05-16 大连海事大学 Flow field time interval deep learning prediction method considering flow control equation constraint

Also Published As

Publication number Publication date
CN110163246B (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN110163246B (en) Monocular light field image unsupervised depth estimation method based on convolutional neural network
CN111210435B (en) Image semantic segmentation method based on local and global feature enhancement module
CN109064507B (en) Multi-motion-stream deep convolution network model method for video prediction
CN108986050B (en) Image and video enhancement method based on multi-branch convolutional neural network
CN110363716B (en) High-quality reconstruction method for generating confrontation network composite degraded image based on conditions
WO2018000752A1 (en) Monocular image depth estimation method based on multi-scale cnn and continuous crf
CN111861880B (en) Image super-fusion method based on regional information enhancement and block self-attention
CN115035171B (en) Self-supervision monocular depth estimation method based on self-attention guide feature fusion
CN113870124B (en) Weak supervision-based double-network mutual excitation learning shadow removing method
CN109903315B (en) Method, apparatus, device and readable storage medium for optical flow prediction
CN114170286B (en) Monocular depth estimation method based on unsupervised deep learning
CN115018888B (en) Optical flow unsupervised estimation method based on transducer
CN114549574A (en) Interactive video matting system based on mask propagation network
CN114092774B (en) RGB-T image significance detection system and detection method based on information flow fusion
CN112686830B (en) Super-resolution method of single depth map based on image decomposition
CN112329793B (en) Significance detection method based on structure self-adaption and scale self-adaption receptive fields
CN116993975A (en) Panoramic camera semantic segmentation method based on deep learning unsupervised field adaptation
CN112270691A (en) Monocular video structure and motion prediction method based on dynamic filter network
CN115035172B (en) Depth estimation method and system based on confidence grading and inter-stage fusion enhancement
CN116402851A (en) Infrared dim target tracking method under complex background
CN116205962A (en) Monocular depth estimation method and system based on complete context information
CN118212240A (en) Automobile gear production defect detection method
CN102222321A (en) Blind reconstruction method for video sequence
CN117689996A (en) Model generation method and device, electronic equipment and storage medium
CN110942463B (en) Video target segmentation method based on generation countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231212

Address after: 311400 4th floor, building 9, Yinhu innovation center, No.9 Fuxian Road, Yinhu street, Fuyang District, Hangzhou City, Zhejiang Province

Patentee after: Zhejiang Xinmai Microelectronics Co.,Ltd.

Address before: 310018 No. 2 street, Xiasha Higher Education Zone, Hangzhou, Zhejiang

Patentee before: HANGZHOU DIANZI University