CN112085702A - Monocular depth estimation method based on sparse depth of key region - Google Patents

Monocular depth estimation method based on sparse depth of key region Download PDF

Info

Publication number
CN112085702A
CN112085702A CN202010777954.9A CN202010777954A CN112085702A CN 112085702 A CN112085702 A CN 112085702A CN 202010777954 A CN202010777954 A CN 202010777954A CN 112085702 A CN112085702 A CN 112085702A
Authority
CN
China
Prior art keywords
depth
layer
sparse
network
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010777954.9A
Other languages
Chinese (zh)
Inventor
颜成钢
张杰华
孙垚棋
张继勇
张勇东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202010777954.9A priority Critical patent/CN112085702A/en
Publication of CN112085702A publication Critical patent/CN112085702A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/513Sparse representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a monocular depth estimation method based on sparse depth of a key region. Inputting the RGB image in the training set and the corresponding sparse depth into an encoder for feature extraction, then performing up-sampling to obtain a predicted depth map with the same size as the input image, calculating a loss function of the network, performing back propagation, and optimizing the connection weight through the selected optimizer and the corresponding parameters. And training for multiple rounds to obtain a final network model. And finally, testing through the test set. The method has the advantages that the sampling points are more reasonable and targeted, the key points for depth estimation of the neural network are selected, the quantitative effect of the depth estimation is improved, the depth is more accurate compared with the depth predicted by the conventional method, the error is smaller, and the generated depth map effect is clearer.

Description

Monocular depth estimation method based on sparse depth of key region
Technical Field
The invention relates to the field of computer vision, in particular to a monocular depth estimation method based on a sparse depth map.
Background
Depth estimation has been widely applied in engineering practice, such as automatic driving, augmented reality, etc., and is a very important research direction in the field of computer vision and a very popular research subject in recent years. The methods for depth estimation are various, and besides monocular camera ranging, there are also several common methods, such as ranging using laser radar, with an accuracy that is not comparable to other methods; ranging by a structured light sensor such as a Kinect; in addition, methods such as binocular camera ranging are available. The laser radar ranging has extremely high accuracy, but is often forbidden due to the high selling price of thousands of dollars, and the laser sensor is easily influenced by environmental factors such as haze and the like; the Kinect and other structured light sensors are short in detection distance, relatively high in energy consumption and sensitive to illumination intensity; binocular camera ranging requires fine manual calibration.
Based on the various defects of the above methods and the rapid development of deep learning in recent years, the use of a monocular camera in combination with a method based on deep learning for depth estimation has attracted extensive attention of researchers, and has gained rapid development, and many monocular depth estimation methods based on deep learning have been proposed in the academic community, and can be roughly classified into three types, namely a supervised method, a weak supervised method and an unsupervised method, from the aspect of supervision information. The input data mainly used at present is still RGB images, and although some progress has been made in this method in recent years, the depth estimation using RGB images is inherently a pathological problem, so the overall accuracy and reliability are still unsatisfactory.
Therefore, a sparse depth is considered to be adopted as a supplement of the RGB image, a reference point is provided for the neural network to predict the depth, and various methods such as SLAM or cheap laser radar are available for acquiring the sparse depth. Thereby greatly improving the accuracy of monocular depth estimation.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: due to the high cost of the laser radar and the low accuracy and reliability of the monocular depth estimation based on the RGB images, the existing monocular depth estimation method based on the depth learning cannot provide an accurate depth map.
In view of the above situation, the present invention provides a monocular depth estimation method based on a sparse depth map. The invention discloses a method for generating a dense depth map by extracting sparse depth of a key area as monocular depth estimation auxiliary input and extracting features by using a deep neural network. The method mainly comprises the following design points: 1) segmenting and data enhancing a data set; 2) designing a network with a self-encoder structure, and performing feature extraction and up-sampling on input data; 3) extracting key areas in the RGB image to obtain sparse depth of the areas; 4) and (5) testing the trained model on a test set.
A monocular depth estimation method based on sparse depth of a key region comprises the following steps:
step 1, preprocessing a data set, cutting, rotating and changing brightness of a training set, and cutting a testing set.
Step 2, designing a network model structure;
the network model is divided into two parts of an encoder and an up-sampling network. The encoder part of the network adopts Resnet-50, removes the last average pooling layer and the full-link layer, and replaces the last average pooling layer and the full-link layer with the convolution layer and the normalization layer with the core size of 1 x 1; the upper sampling network is divided into 6 parts, the first 4 parts are UpProj modules, and then a convolution layer with the convolution kernel size of 3 x 3 and a bilinear interpolation layer are formed;
and 3, training the network, inputting the RGB image in the training set and the corresponding sparse depth into an encoder for feature extraction, and then performing up-sampling to obtain a predicted depth map pred with the same size as the input image.
And 4, calculating a loss function of the network, performing back propagation, and optimizing the connection weight through the selected optimizer and the corresponding parameters. And training for multiple rounds to obtain a final network model.
And 5, testing the network model. And inputting the data image of the test set and the corresponding sparse depth into the trained model to obtain a predicted depth map pred, and calculating each evaluation index.
The above steps are specifically described below:
the step 1 is as follows:
the preprocessing of the training set comprises: zooming the training set data, then randomly turning the training set data horizontally, then randomly rotating the training set data, then performing center cutting on the training set data, performing color data enhancement on the training set data, and finally regularizing the training set data;
the preprocessing of the test set includes: and zooming the test set data, then performing center cropping, and finally regularizing the test data.
The network structure of step 2 is specifically as follows:
the encoder portion of the network uses Resnet-50 to remove the last average pooling layer and full connectivity layer and replace it with convolutional and normalization layers with a core size of 1 x 1. The upsampling network is divided into 6 parts, the first 4 parts are UpProj modules, and then a convolution layer with the convolution kernel size of 3 x 3 and a bilinear interpolation layer are formed. The UpProj module comprises an upper branch and a lower branch, firstly, input data are subjected to up-sampling through an unprol layer, then the data pass through a convolution layer and a standardization layer, are activated by a relu function, then pass through the upper branch and the lower branch respectively and then are added, and finally, the data are activated by the relu function. The structure of the lower branch is a convolution layer and a normalization layer with convolution kernel size of 5 x 5 in sequence, and the convolution layer and the normalization layer with kernel size of 3 x 3 pass through after being activated by the relu function. The upper branching structure is a convolutional layer and a normalization layer with a core size of 5 x 5.
The sparse depth in step 3 is obtained specifically as follows:
and performing Gaussian filtering on the RGB image in the input training set, wherein the Gaussian filtering adopts a convolution layer with a convolution kernel size of 3 x 3. And extracting the image edge by using a canny operator to obtain a mask. And generating a random number matrix s _ mask which has the same size as the mask and the value of 0 to 1 through numpy, and setting a threshold prob to enable the value of the random number matrix s _ mask which is smaller than the threshold prob to be 0 and the rest to be 1. And carrying out bitwise AND operation on the random number matrix s _ mask and the extracted mask to obtain a final sparse depth mask _ depth _ mask. And generating a full 0 matrix sparse _ depth with the same size as the depth map through numpy, and taking the depth value of the position with the sparse depth mask sparse _ depth _ mask value of 1 in the corresponding depth map in the data set as the value of the corresponding position in the full 0 matrix sparse _ depth to obtain the final sparse depth of the key area.
Further, the parameter of the Canny operator is set as: the ratio of the low threshold to the high threshold is 1: 3, the nucleus size is 3 x 3.
The loss function L and related parameters of step 4 are as follows:
the loss function L contains three terms:
L=ldepth+lgrad+lssim
wherein ldepthIs the depth error, which is used to make the prediction and training set group Truth closer, and is defined as follows:
Figure BDA0002619161570000041
n is the number of pixels of the image input into the neural network,
Figure BDA0002619161570000042
one pixel in depth, y, predicted for neural networkspIs a pixel in the training set group Truth.
lgradIs a gradient error for making the edges of the generated depth map sharper and clearer, defined as follows:
Figure BDA0002619161570000043
wherein the content of the first and second substances,
Figure BDA0002619161570000044
is the differential in the x-direction,
Figure BDA0002619161570000045
is the differential in the y-direction.
Figure BDA0002619161570000046
The structure similarity error is adopted, so that the generated depth map has a better display effect:
Figure BDA0002619161570000047
Figure BDA0002619161570000048
wherein, mu represents the mean value,
Figure BDA0002619161570000049
represents the variance of x and represents the variance of x,
Figure BDA00026191615700000410
represents the variance, σ, of yxyRepresents the covariance of x, y, c1、c2Is two constants.
The evaluation index in step 5 includes:
root Mean Square Error (RMSE):
Figure BDA00026191615700000411
average phasePair error (REL):
Figure BDA00026191615700000412
threshold accuracy (i):
Figure BDA00026191615700000413
Card (x) is the number of elements in a set x, and is used in the present invention to calculate the number of pixels.
The invention has the following beneficial effects:
according to the invention, the sparse depth of the edge of the object is extracted as auxiliary input, so that the network can obtain the depth information of the key position, and the prediction can be made better. Meanwhile, the quality of the prediction image is further improved by adding gradient loss and structural similarity loss in the loss function, and the generated image is sharper and has clearer edges.
1. The sampling points are more reasonable and targeted, and the key points for deep estimation of the neural network are selected.
2. The depth estimation method has the advantages that the quantitative effect of the depth estimation is improved, the depth is more accurate compared with the depth predicted by the conventional method, and the error is smaller.
3. The generated depth map has clearer effect.
Drawings
FIG. 1 is an algorithm flow diagram;
FIG. 2 is a flow chart of edge sparse depth extraction;
fig. 3 is a network configuration diagram.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
As shown in fig. 1, the present invention comprises the steps of:
step 1, preprocessing a data set, taking NYUv2 data set as an example, the size of an original image (the original image includes an RGB image and a depth map) is 480 × 640, and performing the following processing:
step 1.1, narrow side is zoomed to 240, long side is zoomed to 320 correspondingly, resource consumed by follow-up operation is reduced
Step 1.2, random horizontal overturning and random rotation are carried out, the random rotation angle is +/-5 degrees, and the diversity of data is enhanced
Step 1.3, center crop is performed, cropping the image to 228 x 304, and converting to tenor data type
And 1.4. Performing principal component analysis to reduce dimension of data set with characteristic value of
[0.2175,0.0188,0.0045]
The feature vector is
Figure BDA0002619161570000061
And step 1.5, carrying out color dithering, wherein the variation range comprises the variation of contrast, brightness and saturation, and the variation range is +/-0.4.
Step 1.6, data normalization is carried out, and the normalized mean value is
[0.485 0.456 0.406]
Standard deviation of
[0.229 0.224 0.225]
The training set is subjected to the above 6 items of image data processing, and the test set is only subjected to step 1.1, step 1.3 and step 1.6.
Step 2. as shown in fig. 3, the network backbone includes an encoder and a decoder for performing upsampling. The encoder is Resnet-50, and the Resnet-50 comprises 6 parts: 7 by 7 convolutional layers; the four convolution modules, Block1 through Block4, have 9, 12, 18, 9 convolutional layers respectively, the last average pooling layer and the fully connected layer are removed and replaced with convolutional layers and normalization layers with core size 1 x 1. After input is input into the encoder, the output obtained from the first 5 parts is 2048 × 8 × 10 eigenvectors, and the output is convolved with 1 × 1 of the 6 th part to obtain 1024 × 8 × 10 eigenvectors.
And then entering an upsampling part, wherein the upsampling part also comprises 6 parts, the first 4 parts are repeated upsampling layers, namely UpProj modules, an UpProj structure is adopted, and then a convolution layer with a convolution kernel size of 3 x 3 and a bilinear interpolation layer are adopted. The UpProj module comprises an upper branch and a lower branch, firstly, input data are subjected to up-sampling through an unprol layer, then pass through a convolution layer and a standard layer, are activated by a relu function, and then pass through the upper branch and the lower branch respectively and then are added for activation. The structure of the lower branch is a convolution layer and a normalization layer with convolution kernel size of 5 x 5 in sequence, and the convolution layer and the normalization layer with kernel size of 3 x 3 pass through after being activated by the relu function. The upper branch structure is a convolution layer and a normalization layer with convolution kernel size of 5 x 5.
And 3, training the network, inputting the RGB image in the training set and the corresponding sparse depth into an encoder for feature extraction, and then performing up-sampling to obtain a predicted depth map pred with the same size as the input image.
As shown in fig. 2, the method for extracting edge sparse depth includes:
taking the NYUv2 dataset as an example, the size of the picture processed in step 1 is 228 x 304, and the rgb picture image in the dataset is gaussian filtered with a kernel size of 3 x 3. And extracting the image edge by using a canny operator to obtain a mask. And generating a random number matrix s _ mask with the size of 228 × 304 and the value of 0 to 1, and setting a threshold value prob to enable the value of s _ mask smaller than prob to be 0 and the rest to be 1.
Figure BDA0002619161570000071
The meaning of the formula is to set the threshold as the ratio of the number of pixels to be sampled to the non-zero element in the edge detection result, where num _ samples is 200 in this embodiment. And carrying out bitwise AND operation on the s _ mask and the extracted mask to obtain a final sparse depth mask _ depth _ mask. And generating a full 0 matrix sparse _ depth with the size of 228 × 304, and taking the depth value of the position with the sparse _ depth _ mask value of 1 in a corresponding depth map in the data set as the value of the corresponding position in the sparse _ depth to finally obtain the sparse depth sparse _ depth of the key area. The image and spark _ depth are fused into a 4-channel matrix to obtain the final network input, and the single size is 4 × 228 × 304.
The parameters of the Canny operator are set as follows: the ratio of the low threshold to the high threshold is 1: 3, the nucleus size is 3 x 3.
Inputting the RGB image in the training set and the corresponding sparse depth into an encoder for feature extraction, wherein the number of channels is changed to be one half of the original number and the length and the width are changed to be two times of the original number every time a feature vector output by the encoder passes through an upsampling layer, after the feature vector passes through four upsampling layers, the feature vector is changed from 1024 × 8 × 10 to 64 × 128 160, after 3 × 3 convolution layers and a normalization layer, the feature vector is 1 × 228 × 304, and the depth map has the same size as the depth map processed in the step 1.
And 4, constructing a loss function L, calculating the error of each forward propagation, and updating the weight of the neural network through a back propagation algorithm. Taking an NYUv2 data set as an example, 50688 training samples are included in the data set, the batch size is set to 32, each iteration is performed for 1584 times, 20 rounds of training are performed in total, the optimizer searches the weight which enables the loss function to be minimum in each iteration, the final weight is obtained after the training is finished, and the model which shows the best performance can be stored by comparing the performance of the models obtained after each round of training is finished.
The specific loss function is:
L=ldepth+lgrad+lssim
wherein ldepthIs the depth error, used to make the prediction closer to the Ground Truth, defined as follows:
Figure BDA0002619161570000081
n is the number of pixels of the image inputted to the neural network, and after the processing of step 1, in this embodiment n 69312,
Figure BDA0002619161570000082
one pixel in depth, y, predicted for neural networkspIs one pixel in the group Truth in the data set.
lgradIs gradient error for making the edge of the generated depth map clearer and sharperThe definition is as follows:
Figure BDA0002619161570000083
Figure BDA0002619161570000084
is the differential in the x-direction,
Figure BDA0002619161570000085
is the differential in the y-direction.
Figure BDA0002619161570000086
The structure similarity error is adopted, so that the generated depth map has a better display effect:
Figure BDA0002619161570000087
Figure BDA0002619161570000088
wherein μ represents the mean value of the mean value,
Figure BDA0002619161570000089
represents the variance of x and represents the variance of x,
Figure BDA00026191615700000810
represents the variance, σ, of yxyRepresents the covariance of x, y, c1、c2Equal to 1, 9 respectively.
The optimizer was a random gradient descent (SGD), the learning rate was 0.01, and 5 rounds of training were performed down to 10% of the original, the momentum was 0.9, and the weight decay was 0.0004.
And 5, loading the model stored in the step 4, and testing through the test set. And obtaining a predicted depth map by the trained model of each sample on the test set, comparing the predicted depth map with the group Truth in the test set, and calculating each evaluation index.
The training and testing environment in this embodiment is:
the system comprises the following steps: ubuntu 16.04
Cpu:Intel(R)Xeon(R)Silver 4114CPU@2.20GHz*4
Memory: 128GB
GPU:RTX2080Ti*4
The evaluation indexes in this example include:
root Mean Square Error (RMSE):
Figure BDA0002619161570000091
mean Relative Error (REL):
Figure BDA0002619161570000092
threshold accuracy (i):
Figure BDA0002619161570000093
Card (x) is the number of elements in a set x, here the number of pixels.
The test results are:
RMSE=0.221,REL=0.044,1=0.972。

Claims (7)

1. a monocular depth estimation method based on sparse depth of a key region is characterized by comprising the following steps:
step 1, preprocessing a data set, cutting, rotating and changing brightness of a training set, and cutting a test set;
step 2, designing a network model structure;
the network model is divided into two parts of an encoder and an up-sampling network; the encoder part of the network adopts Resnet-50, removes the last average pooling layer and the full-link layer, and replaces the last average pooling layer and the full-link layer with the convolution layer and the normalization layer with the core size of 1 x 1; the upper sampling network is divided into 6 parts, the first 4 parts are UpProj modules, and then a convolution layer with the convolution kernel size of 3 x 3 and a bilinear interpolation layer are formed;
step 3, training a network, inputting the RGB image in the training set and the corresponding sparse depth into an encoder for feature extraction, and then performing up-sampling to obtain a prediction depth map pred with the same size as the input image;
step 4, calculating a loss function of the network, performing back propagation, and optimizing the connection weight through the selected optimizer and the corresponding parameters; training for multiple rounds to obtain a final network model;
step 5, testing the network model; and inputting the data image of the test set and the corresponding sparse depth into the trained model to obtain a predicted depth map pred, and calculating each evaluation index.
2. The method for monocular depth estimation based on sparse depth of key regions according to claim 1, wherein step 1 specifically comprises:
the preprocessing of the training set comprises: zooming the training set data, then randomly turning the training set data horizontally, then randomly rotating the training set data, then performing center cutting on the training set data, performing color data enhancement on the training set data, and finally regularizing the training set data;
the preprocessing of the test set includes: and zooming the test set data, then performing center cropping, and finally regularizing the test data.
3. The method for monocular depth estimation based on sparse depth of key regions as claimed in claim 2, wherein the network structure of step 2 is specifically as follows:
the encoder part of the network adopts Resnet-50, removes the last average pooling layer and the full-link layer, and replaces the last average pooling layer and the full-link layer with the convolution layer and the normalization layer with the core size of 1 x 1; the upper sampling network is divided into 6 parts, the first 4 parts are UpProj modules, and then a convolution layer with the convolution kernel size of 3 x 3 and a bilinear interpolation layer are formed; the UpProj module comprises an upper branch and a lower branch, firstly, input data are subjected to up-sampling through an unprol layer, then pass through a convolution layer and a standardization layer, are activated by a relu function, then pass through the upper branch and the lower branch respectively and then are added, and finally, the input data are activated by the relu function; the structure of the lower branch is a convolution layer and a standard layer with convolution kernel size of 5 x 5 in sequence, and the convolution layer and the standard layer with kernel size of 3 x 3 pass through after being activated by a relu function; the upper branching structure is a convolutional layer and a normalization layer with a core size of 5 x 5.
4. The method for monocular depth estimation based on sparse depth of a key region according to claim 1, wherein the sparse depth of step 3 is obtained specifically as follows:
performing Gaussian filtering on the input RGB image in the training set, wherein the Gaussian filtering adopts a convolution layer with a convolution kernel size of 3 x 3; extracting the edge of the image by using a canny operator to obtain a mask; generating a random number matrix s _ mask which has the same size as the mask and a value between 0 and 1 through numpy, and setting a threshold prob to enable the value which is smaller than the threshold prob in the random number matrix s _ mask to be 0 and the rest to be 1; carrying out bitwise AND operation on the random number matrix s _ mask and the extracted mask to obtain a final sparse depth mask _ depth _ mask; and generating a full 0 matrix sparse _ depth with the same size as the depth map through numpy, and taking the depth value of the position with the sparse depth mask sparse _ depth _ mask value of 1 in the corresponding depth map in the data set as the value of the corresponding position in the full 0 matrix sparse _ depth to obtain the final sparse depth of the key area.
5. The method of claim 3, wherein the loss function L and related parameters in step 4 are as follows:
the loss function L contains three terms:
L=ldepth+lgrad+lssim
wherein ldepthIs the depth error, which is used to make the prediction and training set group Truth closer, and is defined as follows:
Figure FDA0002619161560000021
n is the number of pixels of the image input into the neural network,
Figure FDA0002619161560000031
one pixel in depth, y, predicted for neural networkspA pixel in a training set group Truth is obtained;
lgradis a gradient error for making the edges of the generated depth map sharper and clearer, defined as follows:
Figure FDA0002619161560000032
wherein the content of the first and second substances,
Figure FDA0002619161560000033
is the differential in the x-direction,
Figure FDA0002619161560000034
is the differential in the y direction;
Figure FDA0002619161560000035
the structure similarity error is adopted, so that the generated depth map has a better display effect:
Figure FDA0002619161560000036
Figure FDA0002619161560000037
wherein, mu represents the mean value,
Figure FDA0002619161560000038
represents xThe variance of (a) is determined,
Figure FDA0002619161560000039
represents the variance, σ, of yxyRepresents the covariance of x, y, c1、c2Is two constants.
6. The method according to claim 5, wherein the evaluation index in step 5 comprises:
root mean square error
Figure FDA00026191615600000310
Average relative error
Figure FDA00026191615600000311
Threshold accuracy (i):
Figure FDA00026191615600000312
Card (x) is the number of elements in a set x, and is used in the present invention to calculate the number of pixels.
7. The method according to claim 4, wherein the parameters of the Canny operator are set as follows: the ratio of the low threshold to the high threshold is 1: 3, the nucleus size is 3 x 3.
CN202010777954.9A 2020-08-05 2020-08-05 Monocular depth estimation method based on sparse depth of key region Withdrawn CN112085702A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010777954.9A CN112085702A (en) 2020-08-05 2020-08-05 Monocular depth estimation method based on sparse depth of key region

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010777954.9A CN112085702A (en) 2020-08-05 2020-08-05 Monocular depth estimation method based on sparse depth of key region

Publications (1)

Publication Number Publication Date
CN112085702A true CN112085702A (en) 2020-12-15

Family

ID=73736029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010777954.9A Withdrawn CN112085702A (en) 2020-08-05 2020-08-05 Monocular depth estimation method based on sparse depth of key region

Country Status (1)

Country Link
CN (1) CN112085702A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114627351A (en) * 2022-02-18 2022-06-14 电子科技大学 Fusion depth estimation method based on vision and millimeter wave radar

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114627351A (en) * 2022-02-18 2022-06-14 电子科技大学 Fusion depth estimation method based on vision and millimeter wave radar
CN114627351B (en) * 2022-02-18 2023-05-16 电子科技大学 Fusion depth estimation method based on vision and millimeter wave radar

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN108230329B (en) Semantic segmentation method based on multi-scale convolution neural network
CN109544555B (en) Tiny crack segmentation method based on generation type countermeasure network
CN112184577B (en) Single image defogging method based on multiscale self-attention generation countermeasure network
CN111612008B (en) Image segmentation method based on convolution network
CN110648334A (en) Multi-feature cyclic convolution saliency target detection method based on attention mechanism
CN111625608B (en) Method and system for generating electronic map according to remote sensing image based on GAN model
CN110796009A (en) Method and system for detecting marine vessel based on multi-scale convolution neural network model
CN110070517B (en) Blurred image synthesis method based on degradation imaging mechanism and generation countermeasure mechanism
CN111242026B (en) Remote sensing image target detection method based on spatial hierarchy perception module and metric learning
CN111652273A (en) Deep learning-based RGB-D image classification method
CN115908772A (en) Target detection method and system based on Transformer and fusion attention mechanism
CN115861756A (en) Earth background small target identification method based on cascade combination network
CN115330703A (en) Remote sensing image cloud and cloud shadow detection method based on context information fusion
CN116883650A (en) Image-level weak supervision semantic segmentation method based on attention and local stitching
Peng et al. Incorporating generic and specific prior knowledge in a multiscale phase field model for road extraction from VHR images
CN115170978A (en) Vehicle target detection method and device, electronic equipment and storage medium
CN114155165A (en) Image defogging method based on semi-supervision
Babu et al. An efficient image dahazing using Googlenet based convolution neural networks
CN112132867B (en) Remote sensing image change detection method and device
Shit et al. An encoder‐decoder based CNN architecture using end to end dehaze and detection network for proper image visualization and detection
CN116503677B (en) Wetland classification information extraction method, system, electronic equipment and storage medium
CN112085702A (en) Monocular depth estimation method based on sparse depth of key region
Di et al. FDNet: An end-to-end fusion decomposition network for infrared and visible images
CN116740362A (en) Attention-based lightweight asymmetric scene semantic segmentation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20201215

WW01 Invention patent application withdrawn after publication