CN110163246B - Monocular light field image unsupervised depth estimation method based on convolutional neural network - Google Patents

Monocular light field image unsupervised depth estimation method based on convolutional neural network Download PDF

Info

Publication number
CN110163246B
CN110163246B CN201910276356.0A CN201910276356A CN110163246B CN 110163246 B CN110163246 B CN 110163246B CN 201910276356 A CN201910276356 A CN 201910276356A CN 110163246 B CN110163246 B CN 110163246B
Authority
CN
China
Prior art keywords
image
model
network
light field
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910276356.0A
Other languages
Chinese (zh)
Other versions
CN110163246A (en
Inventor
戴国骏
刘高敏
张桦
周文晖
陶星
戴美想
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Xinmai Microelectronics Co ltd
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201910276356.0A priority Critical patent/CN110163246B/en
Publication of CN110163246A publication Critical patent/CN110163246A/en
Application granted granted Critical
Publication of CN110163246B publication Critical patent/CN110163246B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a monocular light field image unsupervised depth estimation method based on a convolutional neural network. The invention firstly uses the disclosed large light field image data set as a training set, and the training set sample tends to be balanced through data enhancement and data expansion. An improved ResNet50 network model is built, high-level and low-level features of the model are extracted by an encoder and a decoder respectively, results of the encoder and the decoder are fused through a dense difference structure, and meanwhile, a super-resolution occlusion detection network is built, so that occlusion problems among all visual angles can be accurately predicted by using deep learning; the objective function based on the light field image depth estimation task is a multi-loss function, the preprocessed image is trained through a predefined network model, and finally the network model is subjected to generalized evaluation on a test set. The method has remarkable preprocessing effect on the light field image of the complex scene, and realizes the effect of more accurate unsupervised depth estimation of the light field image.

Description

Monocular light field image unsupervised depth estimation method based on convolutional neural network
Technical Field
The invention relates to the field of light field image processing, in particular to a monocular light field image unsupervised depth estimation method based on a convolutional neural network.
Background
In recent years, the single image depth estimation based on supervised learning is rapidly developed, but the accurate label to be acquired by the supervised learning is very difficult and is influenced by a plurality of external factors such as environment, illumination and the like, and the influence needs to be overcome at a huge cost. Based on the characteristic of the image depth, the invention discloses a monocular unsupervised depth estimation method based on a depth convolution neural network, and the depth information of the image can be quickly and accurately estimated. Because of the unsupervised estimation, no special making of depth information tags is required, which can greatly reduce the up-front workload and cost of depth estimation.
Disclosure of Invention
The invention aims to solve the problem of the monocular supervised depth estimation data set label, and provides a monocular light field image unsupervised depth estimation method based on a convolutional neural network.
The method considers the influence of the light field image shielding problem of different visual angles (3 multiplied by 3 squared squares) on the depth estimation consistency, enhances the data, constructs a convolutional neural network and provides a loss function of the network suitable for the light field image, realizes the accurate discrete mapping from the image to the depth, and ensures that the image depth estimation result is more accurate, rapid and efficient.
In order to achieve the purpose, the invention provides the following technical scheme which comprises the following main steps:
1. the method for unsupervised depth estimation of the monocular light field image based on the convolutional neural network is characterized by comprising the following steps of:
step 1, data preprocessing:
the experimental data set is based on a light field image data set which is obtained by shooting a real object in the real world by a Lytroillum light field camera disclosed by Stanford;
the data preprocessing comprises image brightness enhancement, horizontal/vertical turning and random shearing;
after data preprocessing, the light field image data set is further expanded, and the diversity of training samples and testing samples is increased;
step 2, constructing models including a convolutional neural network depth estimation model and a convolutional neural network occlusion detection model;
the depth estimation model of the convolutional neural network is specifically realized as follows:
taking a ResNet50 network model as an encoder Eecode, and improving an original network by using self-adaptive normalization on the basis of ResNet50 to adapt to the use of a light field image; the encoder gradually compresses the length and width of the image and increases the number of features, and the original input image is set as I256*256*3The subscript indicates the length, width and channel number of the image, and the intermediate result change process of the step-by-step encoding by the encoder is as follows:
IE 256*256*64→IE 128*128*128→IE 64*64*256→IE 32*32*512→IE 16*16*1024
the decoder just reverses the method, and the length and the width of the characteristic diagram of the result of the encoder are restored to the size of the original image step by step; uses a dense residual structure to connect two processes of Decode and Encode, i.e. IE 32*32*512And ID 32*32*512Connected together through a jump layer;
considering that the parallax range of a camera with a light field is in an interval of [ -4,4], extracting a predicted parallax map by adopting a Tanh activation function, and multiplying the range of Tanh by 4 times on the basis of the obtained parallax map to obtain a real parallax map as the range of Tanh is between [ -1,1 ]; acquiring a disparity map by adopting a 4-layer pyramid structure, so that a disparity map fusion result with 4 different scales is finally obtained by the network;
the convolutional neural network occlusion detection model is used for learning occlusion relations among different visual angles, meanwhile, a plurality of loss functions are used for constraint training, the problem of image occlusion and the problem of consistency of depth estimation are solved, and self-adaptive regularization is used in each layer of structure; the network is composed of 8 layers of full convolution layers, wherein 1 to 3 layers are used for extracting features by an encoder, 4 to 6 layers are used for recovering images by a decoder according to the features, 7 th layer is used for carrying out deconvolution operation for obtaining super-resolution images, and the last layer is used for obtaining the size of an original image by down-sampling;
step 3, in order to optimize the quality of the network model estimation disparity map, estimating images of other visual angles of an original input image through estimated disparity map bilinear interpolation Warping, and constraining a composite map of the images of other visual angles through a loss function;
step 4, setting an optimizer, dynamically optimizing and adjusting the learning rate, dynamically setting an ideal learning rate for the model, setting the initial learning rate to be 0.0001, and slowing down the learning rate along with the increase of the number of batches in the model training process, wherein the slowing down mechanism is as follows: training and parameter solving are carried out on the model by using a momentum-based random gradient descent type network optimization algorithm, a momentum factor mu is dynamically adjusted along with the fluctuation of loss, the initial value of mu is set to be 0.5, when the loss fluctuation is reduced, the network is considered to be relatively stable, and the corresponding mu is reduced, so that the effect of dynamically adjusting the learning rate and refining the training process is achieved;
step 5, training a convolutional neural network:
firstly, selecting 60% of data samples in the data set in the step 1 as a training sample set, and setting a random value to determine that the training set obtained each time is a disordered and uniformly distributed sample;
secondly, defining a loss function and an optimizer, adjusting network parameters and counting indexes;
finally, the network model in the step 2 is used as a training model to train the data sample, and the model is stored after the training is finished, so that the model can be conveniently and rapidly loaded at the later stage;
step 8, testing the convolutional neural network: and evaluating by using the PSNR and the SSIM, wherein the two indexes are indexes for quantizing the image quality accuracy and are used for presenting the quality quantization effect of the synthesized image, and the accuracy index is used for measuring the estimation effect of the model by comparing the data predicted by the model with the test data to finally obtain the accuracy of the depth estimation of the model on the test set.
The loss functions in the convolutional neural network are set to be 3, and are specifically defined as follows:
the first loss function I is the image consistency constraint Limage_lossEnabling the estimated image and the original image to be as close as possible, and also having image quality constraint, which requires that the estimated image and the original image have consistent similarity on local parts;
the second loss function II is the consistency constraint L of the disparity mapconsistThe problem of the consistency of the parallax map and the problem of parallax occlusion are solved;
the third loss function III is a disparity map smoothness constraint loss function LSmoothThe presence of some outliers in the estimated disparity map is prevented from leading to a final underperforming problem.
The loss function I is specified as follows:
the loss function I is used for measuring the difference between the estimated image and the original image, the L1 distance is used for comparison, the image quality is detected by using SSIM, if the estimated image is more similar to the original image, the value of SSIM is closer to 1, and the loss function I is expressed as follows:
Figure BDA0002020160930000041
wherein,
Figure BDA0002020160930000042
representing the original image from view (i, j),
Figure BDA0002020160930000043
what is shown is the view angle (i,j) alpha, beta and psi all represent hyperparameters; the first term of the formula is used for detecting the quality and local similarity of the prediction image, and the second term of the formula is used for detecting the distance between the prediction image and the original image, namely the similarity of pixel values pixel by pixel;
the loss function II is specifically defined as follows:
Figure BDA0002020160930000044
wherein D isi+x,j+yDenotes a disparity map at (i + x, j + y) viewing angle, Di,j+Dx,yRepresenting the disparity map at the (i + x, j + y) view angle obtained by the disparity map at the (i, j) view angle through an (x, y) vector Warping;
the loss function III is defined as follows:
Figure BDA0002020160930000045
wherein,
Figure BDA0002020160930000046
respectively representing partial derivatives of the abscissa and ordinate of the disparity map at (i, j) disparity,
Figure BDA0002020160930000047
respectively representing partial derivatives of the abscissa and ordinate of the original image at the (i, j) parallax;
the final overall loss function is as follows:
Ltotle=Limage_loss+Lconsist+LSmooth
by defining the depth estimation of the light field image by the multi-loss function, the result can be optimized from different aspects, so that the result is more accurate.
The step 1 is data enhancement, after some enhancement operations are performed on original data, the network has stronger robustness and network overfitting can be prevented, three methods of random inversion, color enhancement and random shearing are mainly used, and an original image I is assumed to be composed of 9 pixel blocks and is represented as follows:
Figure BDA0002020160930000051
the random turning has two types of vertical turning and horizontal turning, and the images obtained after turning are respectively I1And I2Then, I1、I2Is represented as follows:
Figure BDA0002020160930000052
the random color enhancement means that an enhanced coefficient is randomly generated at first, the enhanced coefficient can be for an RGB single color channel or directly for three same channels, and if the enhanced coefficient for the three channels is alpha, the enhanced image I3Is represented as follows:
Figure BDA0002020160930000053
the random cropping is an example I of randomly cropping in which the pixel value of a certain region or regions in the original image is changed to 0 or other values so that the semantics of the changed image in some regions are ambiguous and discontinuous4As follows:
Figure BDA0002020160930000054
although only 3 enhancing methods are used, the data samples obtained by combining the enhancing methods are multiple times of the original samples, and the enhanced data are used for training the model, so that the model has stronger robustness and generalization capability, and the prediction accuracy of the model is further improved.
The invention has the following beneficial effects:
the invention relates to a monocular light field image unsupervised depth estimation based on a convolutional neural network, which does not need to specially manufacture a label of a data set due to unsupervised estimation, so that the depth estimation is more convenient, and meanwhile, the model is restrained by using a multi-loss function, so that the model has high prediction precision. The improved ResNet50 model has good generalization performance, a convolutional neural network model framework with deeper depth is used, the performance is good, the robustness is stronger due to a dense poor structure, the learning process can be stabilized through case regularization, the model convergence rate is effectively improved, the problems of occlusion and boundary blurring and occlusion are effectively solved by the super-resolution occlusion detection network, and the target function combines multiple loss functions to serve as a network model optimizer. By properly adopting some training skills and selecting ideal network parameters, an optimization algorithm and the setting of the learning rate, the network is more stable, the result is more reliable, and the unsupervised depth estimation accuracy of the light field image is greatly improved.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The invention is further illustrated by the following figures and examples.
As shown in fig. 1, the method for unsupervised depth estimation of monocular light field images based on convolutional neural network specifically includes the following steps:
step 1, the experimental data set is based on a light field image data set which is disclosed by Stanford and obtained by shooting real objects in the real world by a Lytroillum light field camera, and the data set comprises a large number of plants, flowers, street views, sculpture images and the like. The images are preprocessed and enhanced, the enhancement methods mainly used in the invention comprise image brightness enhancement, horizontal/vertical turning, random shearing and the like, after the images are enhanced, the data set is further expanded, the diversity of training samples and testing samples is increased, the robustness of a network model is further enhanced, and the generalization capability of the model is stronger so as not to generate an overfitting phenomenon. On the other hand, the model performance is improved to a certain extent.
After some enhancement operations are performed on original data, the network can have stronger robustness and prevent the network from overfitting, three methods of random inversion, color enhancement and random shearing are mainly used, and an original image I is assumed to be composed of 9 pixel blocks and is represented as follows:
Figure BDA0002020160930000061
the random turning has two types of vertical turning and horizontal turning, and the images obtained after turning are respectively I1And I2Then, I1、I2Is represented as follows:
Figure BDA0002020160930000071
the random color enhancement means that an enhanced coefficient is randomly generated at first, the enhanced coefficient can be for an RGB single color channel or directly for three same channels, and if the enhanced coefficient for the three channels is alpha, the enhanced image I3Is represented as follows:
Figure BDA0002020160930000072
the random cropping is an example I of randomly cropping in which the pixel value of a certain region or regions in the original image is changed to 0 or other values so that the semantics of the changed image in some regions are ambiguous and discontinuous4As follows:
Figure BDA0002020160930000073
although only 3 enhancing methods are used, the data samples obtained by combining the enhancing methods are multiple times of the original samples, and the enhanced data are used for training the model, so that the model has stronger robustness and generalization capability, and the prediction accuracy of the model is further improved.
And 2, constructing an unsupervised depth estimation method of the convolutional neural network based on ResNet50, and performing foreground and background feature region identification and feature segmentation by using an encoder Encode and a decoder Decode network, so that the extracted feature region realizes higher segmentation accuracy, and the efficiency and accuracy of deep learning of image features by the convolutional neural network are improved. And (3) fusing the low-level features and the high-level features learned by the network together by using a pyramid model with a 4-layer structure and a residual structure, and increasing the information learned by the network.
And 3, after processing of the network model, wherein the occlusion is always the largest factor influencing the depth estimation precision, in order to enable better occlusion information to be learned, a super-resolution convolutional neural network model for occlusion learning is constructed, occlusion relations between different visual angles are learned, besides, a plurality of loss functions are used for constraint training, the problem of image occlusion and the problem of consistency of depth estimation are solved, self-adaptive regularization is used in each layer structure, the overfitting phenomenon is avoided, and the generalization capability of the network and the depth estimation precision are also improved.
And 4, in order to test the quality of the estimated disparity map of the network model, estimating images of other visual angles of the estimated original central image of the disparity map by a bilinear interpolation method, and optimizing the estimated image and the original image.
Step 5, defining a network model loss function, wherein in order to better guide the network to train, a special loss function is defined for the light field image for constraint, 3 loss functions are mainly used for constraint, the first one is image consistency constraint, so that an estimated image and an original image can be as close as possible, and image quality constraint is also provided, which requires that the similarity of the estimated image and the original image on local parts is consistent; secondly, the consistency constraint of the disparity map is carried out, and the problem of disparity map consistency is solved; the third is the disparity map smoothness constraint, which prevents some outliers existing in the estimated disparity map from causing the final result to be poor in accuracy. These loss functions need to be redefined to enable adaptation to the light-field image.
Because the loss function of the network is used for guiding network optimization and measuring the error between the predicted value and the real sample mark, the quality of the loss function is directly related to the quality of the final result of the network, and 3 special loss functions are designed for guiding network training optimization.
Loss function 1. image consistency constraint, which measures the difference between the estimated image and the original image, the L1 distance used here is compared, the image quality is detected using SSIM, the value of SSIM is closer to 1 if the estimated image is more similar to the original image, and the loss function is expressed as follows:
Figure BDA0002020160930000081
in the above-mentioned formula, the first and second groups,
Figure BDA0002020160930000082
representing the original image from view (i, j),
Figure BDA0002020160930000083
shown are the estimated images of the viewing angles (i, j), α, β、ΨIs a super ginseng. The former term is used for detecting the quality and local similarity of the prediction image, and the latter term is used for detecting the distance between the prediction image and the original image, namely the similarity of pixel values pixel by pixel.
The invention trains a network specially used for occlusion detection to predict the occluded part, and also defines a loss function to constrain the consistency between disparity maps, wherein the loss function is defined as follows:
Figure BDA0002020160930000091
in the above formula Di+x,j+yIs shown in (i +x, j + y) view angle, Di,j+Dx,yThe disparity map at the (i + x, j + y) view angle is obtained from the disparity map at the point (i, j) through the (x, y) vector Warping, and if the disparity estimation is correct, the two terms should be equal, that is, the two terms are consistent.
And 3, defining a parallax smoothness loss function to restrict the parallax smoothness constraint in order to eliminate the influence of some abnormal values in the predicted parallax map on the result, wherein the loss function is defined as follows:
Figure BDA0002020160930000092
in the above-mentioned formula, the first and second groups,
Figure BDA0002020160930000093
respectively representing partial derivatives of the abscissa and ordinate of the disparity map at (i, j),
Figure BDA0002020160930000094
the partial derivatives of the original at (i, j) are indicated on the abscissa and ordinate, respectively. That is, the greater the deviation or gradient of the original image, the smaller the deviation coefficient of the disparity map and the smoother the disparity map, so the final total loss function of the present invention is as follows:
Ltotle=Limage_loss+Lconsist+LSmooth
by defining the depth estimation of the light field image by the multi-loss function, the result can be optimized from different aspects, so that the result is more accurate.
And 6, dynamically optimizing and adjusting the learning rate, dynamically setting an ideal learning rate for the model, setting the initial learning rate to be 0.0001, and slowing down the learning rate along with the increase of the number of batches in the model training process, wherein the slowing down mechanism is as follows: if the loss stops decreasing within two or more training batches, the learning rate is decreased to
Figure BDA0002020160930000101
Training and parameter solving are carried out on the model by utilizing a momentum-based random gradient descent type network optimization algorithm, a momentum factor mu is set to be dynamically adjusted along with the fluctuation of loss, the initial value of mu is set to be 0.5, when the loss fluctuation is reduced, the network is considered to be basically stable, the corresponding mu is reduced, and therefore the dynamic regulation learning rate is achieved
Figure BDA0002020160930000102
The effect of the training process is refined, and the network can be helped to jump out of local limitation when the network tends to converge in the middle and later stages of network training and the network parameters oscillate back and forth near the local minimum value, so that better network parameters can be found.
And 7, when the network training module trains the convolutional neural network, firstly, 60% of data samples in the data set in the step 1 are selected as a training sample set, and a random value is set to determine that the training set obtained each time is a sample which is unordered and uniformly distributed. And (5) defining a loss function in the step (4) and an optimizer in the step (5), adjusting network parameters and counting indexes. And (3) taking the network model in the step (2) as a training model to train the data sample, and storing the model after the training is finished so as to facilitate the loading of the model at a later stage.
And 8, the network test module evaluates by using the PSNR and the SSIM, wherein the two indexes are indexes for quantifying the image quality accuracy rate and are used for presenting the visual effect of the quality of the synthesized image, the estimation effect of the model is measured by comparing the data predicted by the model with the test data and using the accuracy rate index, and finally the accuracy rate of the estimated depth of the model on the test set is obtained.

Claims (1)

1. The method for unsupervised depth estimation of the monocular light field image based on the convolutional neural network is characterized by comprising the following steps of:
step 1, data preprocessing:
the experimental data set is based on a light field image data set which is obtained by shooting a real object in the real world by a Lytroillum light field camera disclosed by Stanford;
the data preprocessing comprises image brightness enhancement, horizontal/vertical turning and random shearing;
after data preprocessing, the light field image data set is further expanded, and the diversity of training samples and testing samples is increased;
step 2, constructing models including a convolutional neural network depth estimation model and a convolutional neural network occlusion detection model;
the depth estimation model of the convolutional neural network is specifically realized as follows:
taking a ResNet50 network model as an encoder Eecode, and improving an original network by using self-adaptive normalization on the basis of ResNet50 to adapt to the use of a light field image; the encoder gradually compresses the length and width of the image and increases the number of features, and the original input image is set as I256*256*3The subscript indicates the length, width and channel number of the image, and the intermediate result change process of the step-by-step encoding by the encoder is as follows:
IE 256*256*64→IE 128*128*128→IE 64*64*256→IE 32*32*512→IE 16*16*1024
the decoder just reverses the method, and the length and the width of the characteristic diagram of the result of the encoder are restored to the size of the original image step by step; uses a dense residual structure to connect two processes of Decode and Encode, i.e. IE 32*32*512And ID 32*32*512Connected together through a jump layer;
considering that the parallax range of a camera with a light field is in an interval of [ -4,4], extracting a predicted parallax map by adopting a Tanh activation function, and multiplying the range of Tanh by 4 times on the basis of the obtained parallax map to obtain a real parallax map as the range of Tanh is between [ -1,1 ]; acquiring a disparity map by adopting a 4-layer pyramid structure, so that a disparity map fusion result with 4 different scales is finally obtained by the network;
the convolutional neural network occlusion detection model is used for learning occlusion relations among different visual angles, meanwhile, a plurality of loss functions are used for constraint training, the problem of image occlusion and the problem of consistency of depth estimation are solved, and self-adaptive regularization is used in each layer of structure; the network is composed of 8 layers of full convolution layers, wherein 1 to 3 layers are used for extracting features by an encoder, 4 to 6 layers are used for recovering images by a decoder according to the features, 7 th layer is used for carrying out deconvolution operation for obtaining super-resolution images, and the last layer is used for obtaining the size of an original image by down-sampling;
step 3, in order to optimize the quality of the network model estimation disparity map, estimating images of other visual angles of an original input image through estimated disparity map bilinear interpolation Warping, and constraining a composite map of the images of other visual angles through a loss function;
step 4, setting an optimizer, dynamically optimizing and adjusting the learning rate, dynamically setting an ideal learning rate for the model, setting the initial learning rate to be 0.0001, and slowing down the learning rate along with the increase of the number of batches in the model training process, wherein the slowing down mechanism is as follows: training and parameter solving are carried out on the model by using a momentum-based random gradient descent type network optimization algorithm, a momentum factor mu is dynamically adjusted along with the fluctuation of loss, the initial value of mu is set to be 0.5, when the loss fluctuation is reduced, the network is considered to be relatively stable, and the corresponding mu is reduced, so that the effect of dynamically adjusting the learning rate and refining the training process is achieved;
step 5, training a convolutional neural network:
firstly, selecting 60% of data samples in the data set in the step 1 as a training sample set, and setting a random value to determine that the training set obtained each time is a disordered and uniformly distributed sample;
secondly, defining a loss function and an optimizer, adjusting network parameters and counting indexes;
finally, the network model in the step 2 is used as a training model to train the data sample, and the model is stored after the training is finished, so that the model can be conveniently and rapidly loaded at the later stage;
step 6, testing the convolutional neural network: PSNR and SSIM are used for evaluation, the two indexes are used for quantizing the accuracy of image quality and are used for presenting the quality quantization effect of a synthesized image, the data predicted by the model is compared with test data, the estimation effect of the model is measured by using the accuracy index, and finally the accuracy of the depth estimation of the model on a test set is obtained;
the loss functions in the convolutional neural network are set to be 3, and are specifically defined as follows:
the first loss function I is the image consistency constraint Limage_lossEnabling the estimated image and the original image to be as close as possible, and also having image quality constraint, which requires that the estimated image and the original image have consistent similarity on local parts;
the second loss function II is the disparity map consistency constraint LconsistThe problem of the consistency of the parallax map and the problem of parallax occlusion are solved;
the third loss function III is a disparity map smoothness constraint loss function LSmoothThe problem that some abnormal points existing in the estimated disparity map cause poor final results is prevented;
the loss function i is specified as follows:
the loss function i is used for measuring the difference between the estimated image and the original image, the L1 distance is used for comparison, the image quality is detected by using SSIM, if the estimated image is more similar to the original image, the value of SSIM is closer to 1, and the loss function i is expressed as follows:
Figure FDA0002943203990000031
wherein,
Figure FDA0002943203990000032
representing the original image from view (i, j),
Figure FDA0002943203990000033
the estimated images of the view angles (i, j) are shown, and alpha, beta and psi all represent hyperparameters; the first term of the formula is to detect the quality, local similarity of the prediction graph, and the second term of the formula is to detect the local similarity of the prediction graphThe item is used for detecting the distance between the prediction image and the original image, namely the similarity of pixel values of pixel points by pixel points;
the loss function ii is specifically defined as follows:
Figure FDA0002943203990000034
wherein D isi+x,j+yRepresenting a disparity map at (i + x, j + y) view,
Figure FDA0002943203990000035
representing the disparity map at the (i + x, j + y) view angle obtained by the disparity map at the (i, j) view angle through an (x, y) vector Warping;
the loss function iii is defined as follows:
Figure FDA0002943203990000036
wherein,
Figure FDA0002943203990000037
respectively representing partial derivatives of the abscissa and ordinate of the disparity map at (i, j) disparity,
Figure FDA0002943203990000041
respectively representing partial derivatives of the abscissa and ordinate of the original image at the (i, j) parallax;
the final overall loss function is as follows:
Ltotle=Limage_loss+Lconsist+LSmooth
CN201910276356.0A 2019-04-08 2019-04-08 Monocular light field image unsupervised depth estimation method based on convolutional neural network Active CN110163246B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910276356.0A CN110163246B (en) 2019-04-08 2019-04-08 Monocular light field image unsupervised depth estimation method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910276356.0A CN110163246B (en) 2019-04-08 2019-04-08 Monocular light field image unsupervised depth estimation method based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN110163246A CN110163246A (en) 2019-08-23
CN110163246B true CN110163246B (en) 2021-03-30

Family

ID=67638504

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910276356.0A Active CN110163246B (en) 2019-04-08 2019-04-08 Monocular light field image unsupervised depth estimation method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN110163246B (en)

Families Citing this family (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503082B (en) * 2019-08-30 2024-03-12 腾讯科技(深圳)有限公司 Model training method based on deep learning and related device
CN110689060B (en) * 2019-09-16 2022-01-28 西安电子科技大学 Heterogeneous image matching method based on aggregation feature difference learning network
CN112734864A (en) * 2019-10-28 2021-04-30 天津大学青岛海洋技术研究院 Three-way convolution neural network structure for coloring gray level image
CN110956202B (en) * 2019-11-13 2023-08-01 重庆大学 Image training method, system, medium and intelligent device based on distributed learning
CN111047630B (en) * 2019-11-13 2023-06-13 芯启源(上海)半导体科技有限公司 Neural network and target detection and depth prediction method based on neural network
CN110942484B (en) * 2019-11-26 2022-07-12 福州大学 Camera self-motion estimation method based on occlusion perception and feature pyramid matching
CN111242131B (en) * 2020-01-06 2024-05-10 北京十六进制科技有限公司 Method, storage medium and device for identifying images in intelligent paper reading
CN111222481B (en) * 2020-01-14 2022-09-09 武汉船舶通信研究所(中国船舶重工集团公司第七二二研究所) Method and device for identifying clothes color
CN113139553B (en) * 2020-01-16 2024-07-12 中国科学院国家空间科学中心 U-net-based method and system for extracting aurora egg morphology of ultraviolet aurora image
CN111325782A (en) * 2020-02-18 2020-06-23 南京航空航天大学 Unsupervised monocular view depth estimation method based on multi-scale unification
CN113393510B (en) * 2020-03-12 2023-05-12 武汉Tcl集团工业研究院有限公司 Image processing method, intelligent terminal and storage medium
CN113410861B (en) * 2020-03-17 2023-01-20 内蒙古电力(集团)有限责任公司内蒙古电力科学研究院分公司 Droop control parameter optimization method suitable for multi-terminal flexible direct current system
US20210326694A1 (en) * 2020-04-20 2021-10-21 Nvidia Corporation Distance determinations using one or more neural networks
CN111476835B (en) * 2020-05-21 2021-08-10 中国科学院自动化研究所 Unsupervised depth prediction method, system and device for consistency of multi-view images
CN111597524B (en) * 2020-05-22 2021-03-23 江苏濠汉信息技术有限公司 Verification method and system for seal sample sampling personnel
CN111899295B (en) * 2020-06-06 2022-11-15 东南大学 Monocular scene depth prediction method based on deep learning
CN111833390B (en) * 2020-06-23 2023-06-20 杭州电子科技大学 Light field depth estimation method based on unsupervised deep learning
CN112633052A (en) * 2020-09-15 2021-04-09 北京华电天仁电力控制技术有限公司 Belt tearing detection method
CN112330724B (en) * 2020-10-15 2024-04-09 贵州大学 Integrated attention enhancement-based unsupervised multi-modal image registration method
CN112270692B (en) * 2020-10-15 2022-07-05 电子科技大学 Monocular video structure and motion prediction self-supervision method based on super-resolution
CN112288789B (en) * 2020-10-26 2024-03-29 杭州电子科技大学 Light field depth self-supervision learning method based on iterative optimization of shielding region
CN112215303B (en) * 2020-11-05 2022-02-11 北京理工大学 Image understanding method and system based on self-learning attribute
CN113516141B (en) * 2020-11-06 2024-03-01 阿里巴巴集团控股有限公司 Optimization method, equipment and storage medium of depth measurement model
CN112396645B (en) * 2020-11-06 2022-05-31 华中科技大学 Monocular image depth estimation method and system based on convolution residual learning
CN112435198A (en) * 2020-12-03 2021-03-02 西安交通大学 Welding seam radiographic inspection negative image enhancement method, storage medium and equipment
CN112488033B (en) * 2020-12-10 2024-10-18 北京金山云网络技术有限公司 Data set construction method and device, electronic equipment and storage medium
CN112561818B (en) * 2020-12-14 2024-05-28 英特灵达信息技术(深圳)有限公司 Image enhancement method and device, electronic equipment and storage medium
CN112561979B (en) * 2020-12-25 2022-06-28 天津大学 Self-supervision monocular depth estimation method based on deep learning
CN112785637B (en) * 2021-01-20 2022-10-11 大连理工大学 Light field depth estimation method based on dynamic fusion network
CN112819742B (en) * 2021-02-05 2022-05-13 武汉大学 Event field synthetic aperture imaging method based on convolutional neural network
CN113361378B (en) * 2021-06-02 2023-03-10 合肥工业大学 Human body posture estimation method using adaptive data enhancement
CN113592913B (en) * 2021-08-09 2023-12-26 中国科学院深圳先进技术研究院 Method for eliminating uncertainty of self-supervision three-dimensional reconstruction
CN116127844B (en) * 2023-02-08 2023-10-31 大连海事大学 Flow field time interval deep learning prediction method considering flow control equation constraint

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102903096A (en) * 2012-07-04 2013-01-30 北京航空航天大学 Monocular video based object depth extraction method
CN106612427A (en) * 2016-12-29 2017-05-03 浙江工商大学 Method for generating spatial-temporal consistency depth map sequence based on convolution neural network
CN107204010A (en) * 2017-04-28 2017-09-26 中国科学院计算技术研究所 A kind of monocular image depth estimation method and system
CN107993260A (en) * 2017-12-14 2018-05-04 浙江工商大学 A kind of light field image depth estimation method based on mixed type convolutional neural networks
CN108596965A (en) * 2018-03-16 2018-09-28 天津大学 A kind of light field image depth estimation method
CN108846473A (en) * 2018-04-10 2018-11-20 杭州电子科技大学 Light field depth estimation method based on direction and dimension self-adaption convolutional neural networks
CN108961327A (en) * 2018-05-22 2018-12-07 深圳市商汤科技有限公司 A kind of monocular depth estimation method and its device, equipment and storage medium
CN109191515A (en) * 2018-07-25 2019-01-11 北京市商汤科技开发有限公司 A kind of image parallactic estimation method and device, storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7599547B2 (en) * 2005-11-30 2009-10-06 Microsoft Corporation Symmetric stereo model for handling occlusion
EP2722816A3 (en) * 2012-10-18 2017-04-19 Thomson Licensing Spatio-temporal confidence maps

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102903096A (en) * 2012-07-04 2013-01-30 北京航空航天大学 Monocular video based object depth extraction method
CN106612427A (en) * 2016-12-29 2017-05-03 浙江工商大学 Method for generating spatial-temporal consistency depth map sequence based on convolution neural network
CN107204010A (en) * 2017-04-28 2017-09-26 中国科学院计算技术研究所 A kind of monocular image depth estimation method and system
CN107993260A (en) * 2017-12-14 2018-05-04 浙江工商大学 A kind of light field image depth estimation method based on mixed type convolutional neural networks
CN108596965A (en) * 2018-03-16 2018-09-28 天津大学 A kind of light field image depth estimation method
CN108846473A (en) * 2018-04-10 2018-11-20 杭州电子科技大学 Light field depth estimation method based on direction and dimension self-adaption convolutional neural networks
CN108961327A (en) * 2018-05-22 2018-12-07 深圳市商汤科技有限公司 A kind of monocular depth estimation method and its device, equipment and storage medium
CN109191515A (en) * 2018-07-25 2019-01-11 北京市商汤科技开发有限公司 A kind of image parallactic estimation method and device, storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"Depth Estima tion with Occlusion Modeling Using Light-Field Cameras";Ting-Chun Wang 等;《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》;20161130;第38卷(第11期);第2170-2181页 *
"Depth Estimation with Occlusion Handling from a Sparse Set of Light Field Views";Xiaoran Jiang 等;《2018 25th IEEE International Conference on Image Processing (ICIP)》;20180930;第633-638页 *
"EPI-Patch Based Convolutional Neural Network for Depth Estimation on 4D Light Field";Yaoxiang Luo 等;《neural information processing》;20171028;第642-652页 *
"基于卷积神经网络的光场图像深度估计技术研究";罗姚翔;《中国优秀硕士学位论文全文数据库信息科技辑》;20190115(第01期);全文 *
"基于深度线索和遮挡检测的光场相机深度估计研究";李鹏飞;《中国优秀硕士学位论文全文数据库信息科技辑》;20190115(第01期);全文 *

Also Published As

Publication number Publication date
CN110163246A (en) 2019-08-23

Similar Documents

Publication Publication Date Title
CN110163246B (en) Monocular light field image unsupervised depth estimation method based on convolutional neural network
CN111210435B (en) Image semantic segmentation method based on local and global feature enhancement module
CN109064507B (en) Multi-motion-stream deep convolution network model method for video prediction
CN111539887B (en) Channel attention mechanism and layered learning neural network image defogging method based on mixed convolution
WO2018000752A1 (en) Monocular image depth estimation method based on multi-scale cnn and continuous crf
Cherabier et al. Learning priors for semantic 3d reconstruction
CN111861880B (en) Image super-fusion method based on regional information enhancement and block self-attention
CN115035171B (en) Self-supervision monocular depth estimation method based on self-attention guide feature fusion
CN109903315B (en) Method, apparatus, device and readable storage medium for optical flow prediction
CN110246148A (en) The conspicuousness detection method of multi-modal depth information fusion and attention study
CN114092774B (en) RGB-T image significance detection system and detection method based on information flow fusion
CN112686830B (en) Super-resolution method of single depth map based on image decomposition
CN113052755A (en) High-resolution image intelligent matting method based on deep learning
CN113436220B (en) Image background estimation method based on depth map segmentation
CN112990078A (en) Facial expression generation method based on generation type confrontation network
CN110942484A (en) Camera self-motion estimation method based on occlusion perception and feature pyramid matching
CN115018888A (en) Optical flow unsupervised estimation method based on Transformer
CN112270691A (en) Monocular video structure and motion prediction method based on dynamic filter network
CN116402851A (en) Infrared dim target tracking method under complex background
CN117557779A (en) YOLO-based multi-scale target detection method
CN116485867A (en) Structured scene depth estimation method for automatic driving
CN118172832A (en) Action recognition method, training method, device, equipment and medium of action recognition model
CN118212240A (en) Automobile gear production defect detection method
CN110942463B (en) Video target segmentation method based on generation countermeasure network
CN114463187B (en) Image semantic segmentation method and system based on aggregation edge features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20231212

Address after: 311400 4th floor, building 9, Yinhu innovation center, No.9 Fuxian Road, Yinhu street, Fuyang District, Hangzhou City, Zhejiang Province

Patentee after: Zhejiang Xinmai Microelectronics Co.,Ltd.

Address before: 310018 No. 2 street, Xiasha Higher Education Zone, Hangzhou, Zhejiang

Patentee before: HANGZHOU DIANZI University

TR01 Transfer of patent right