CN111369608A - Visual odometer method based on image depth estimation - Google Patents
Visual odometer method based on image depth estimation Download PDFInfo
- Publication number
- CN111369608A CN111369608A CN202010478460.0A CN202010478460A CN111369608A CN 111369608 A CN111369608 A CN 111369608A CN 202010478460 A CN202010478460 A CN 202010478460A CN 111369608 A CN111369608 A CN 111369608A
- Authority
- CN
- China
- Prior art keywords
- image
- depth
- estimation
- loss
- pose
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000000007 visual effect Effects 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000012549 training Methods 0.000 claims abstract description 29
- 230000006870 function Effects 0.000 claims abstract description 27
- 238000004364 calculation method Methods 0.000 claims abstract description 24
- 230000001537 neural effect Effects 0.000 claims abstract description 5
- 238000013528 artificial neural network Methods 0.000 claims description 14
- 230000009466 transformation Effects 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 6
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 125000004122 cyclic group Chemical group 0.000 claims description 4
- 230000002159 abnormal effect Effects 0.000 claims description 2
- 238000006073 displacement reaction Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 claims description 2
- 238000002474 experimental method Methods 0.000 abstract description 6
- 238000013461 design Methods 0.000 abstract description 4
- 230000011218 segmentation Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000012360 testing method Methods 0.000 description 7
- 230000009471 action Effects 0.000 description 3
- 241000282414 Homo sapiens Species 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a visual odometer method based on image depth estimation, and provides an algorithm idea for realizing scale consistency constraint by combining a depth image and a monocular image aiming at a common scale fuzzy problem in a monocular visual odometer. In the network structure design, a long-time memory unit and a short-time memory unit are fused into a convolution neural unit, monocular images are used for training a depth estimation network, and luminosity consistency loss is introduced into a loss function, and smoothness loss calculation is added to capture more image characteristics and generate more accurate depth images. And then, the estimated depth image and the original monocular image are combined to realize scale consistency constraint, the pose estimation network is trained, the depth estimation network and the pose estimation network are respectively subjected to experiment and result analysis, and the result shows that the visual odometer combined with the depth image estimation can solve the scale fuzzy problem in the monocular visual odometer to a certain extent.
Description
Technical Field
The invention relates to the technical field of visual odometry, in particular to a visual odometry method based on image depth estimation.
Background
The input used by monocular visual odometers (i.e. self-motion estimation of vehicles, robots from a sequence of images in a single view) is RGB images, but in the field of computer vision and robotics, the use of Depth image (Depth Map) information also provides vital information for various applications, such as autopilot, virtual reality VR, augmented reality AR applications, etc. A depth Image, also known as a Range Image, refers to an Image or Image channel that contains information about the distance to the surface of an object in a viewpoint scene, and its pixel value represents the actual distance of the sensor from the object. When the traditional monocular vision odometer is used for pose estimation, compared with a binocular vision odometer, the pose estimation method has a very obvious defect of scale ambiguity (Scale ambiguity). Scale-blurring means that the monocular visual odometer cannot judge the specific length of translational motion, i.e., the scale factor, simply by correlation between features. Most of the existing methods for solving the problem are to fuse image measurement information and other sensor information, such as Inertial Navigation System (INS), GNSS sensor information, and the like. Although the scale ambiguity problem can be solved by additionally using other sensors, the method breaks one of the biggest advantages of the monocular vision odometer structure, namely small volume and low cost.
However, most methods based on the convolutional neural network CNN only use depth estimation as a single-view task, neglecting important timing information in monocular or binocular video, the basic principle of the single-view depth estimation method is the possibility that human beings sense depth through a single image, but neglecting more important motion for human beings when distance is inferred, and moving objects existing when geometric image reconstruction is performed in a monocular vision meter affect the static assumption of a scene, thereby affecting the performance of the scene.
Disclosure of Invention
1. Network architecture design
The network architecture comprises a neural network of a monocular depth image and a monocular RGB image, the depth image estimation network and the self-movement estimation network are trained by using a monocular image frame sequence, the scale consistency constraint is realized, and the whole process comprises the following steps:
s1, image from given two consecutive framesRespectively estimating the depth by using a depth estimation network to obtain corresponding depth images;
S2, converting the original imageWith corresponding depth image estimationCollectively as inputs to an auto-motion estimation network, and outputs a pose prediction for the camera at time t;
S3, converting the estimated pose into a pose transformation matrix of 4 × 4Is calculated according to the transformation matrixDepth image of next frameBy calculatingAnd the consistency between the pose and the pose is lost, and the model training is carried out to improve the scale consistency of the pose prediction, as shown in figure 1.
2. Depth estimation network
The depth estimation network employs a self-encoding-decoding U-type network architecture, as shown in fig. 2. According to the invention, a cyclic nerve unit and an encoder unit are fused to form a long-time memory unit as a self-encoding part of a network, so that spatial information and time information are simultaneously utilized; the spatio-temporal features computed by the encoder are then input into a decoder network for accurate depth image estimation and reconstruction, and the decoder part fuses low-level feature representations from different levels of the encoder by using a jump connection method, and fig. 3 shows specific parameter settings of a neural network architecture for depth estimation.
3. Pose estimation network
The neural network for pose estimation uses a VGG16 convolutional neural network architecture and is designed by fusing a cyclic neural unit, and the visual odometry network is characterized in that: 1) in the scheme, the input of the visual odometer comprises the depth image information of the current frame, so that the scale consistency of a scene between the depth and the pose is ensured; 2) the input adopted by the visual odometer is the joint representation of the image frame and the depth image corresponding to a single time point, and the information of the previous frame is stored in the hidden layer; 3) the visual odometry network is able to maintain the same scene scale when run over the entire image sequence.
4. Loss function
Computing predicted depth imagesAnd a loss of photometric consistency between the known depth image data, performing supervised training on the depth estimation neural network; the luminosity loss provides less information in a low-texture environment, and smoothness loss calculation is also added during depth estimation; in the part of the visual odometer, pose information estimated by a network and truth value information provided in a data set are used for calculating pose estimation loss, so that supervision training of a pose estimation network is realized; introducing geometric consistency loss, performing torsion transformation on the depth image estimated in the previous frame according to the pose transformation matrix, and calculating the difference between the depth image estimated in the previous frame and the depth image estimated in the next frame; the overall objective loss function is calculated as follows:
where the loss of photometric consistency and the loss of smoothness are represented respectively,representing pose estimation loss and representing geometric consistency loss; in order to balance the scale and the size of each loss calculation result, a corresponding weight parameter is added for the calculation of the loss of each category; parameters are also added to control the degree of smoothing of the depth image.
4.1 loss of photometric uniformity and smoothness
The brightness consistency and the space smooth prior used in the dense association algorithm are used for calculating the luminosity difference between the estimated depth image and the really acquired depth image information, and are used as the loss function of network training, and the calculation formula of the luminosity consistency loss function is expressed as follows:
wherein,expressing the number of pixel points in the image, and expressing the set of all the pixel points in the image by V; the L1 norm loss function is selected in the calculation of the loss function; the L1 norm loss function, also called the minimum absolute deviation or minimum absolute error, is calculated as and minimized by the sum of the absolute values of the differences between the estimated and target values; compared with the L2 norm loss function for calculating the sum of the squares of the differences, the calculation method of the L1 loss function has better robustness in processing abnormal values, and the L1 norm loss in the photometric consistency difference can be calculated according to the following formula:
the luminance loss is less in the information quantity provided when the scene is uniformly distributed and the texture is less, more information is generated by calculating multiple differences, and the calculation of smoothness loss is introduced, so that the network can more sensitively sense the edge information in the image, and the accuracy of the output result in a low-texture environment is ensured; the smoothness loss is calculated as follows:
whereinRepresenting the first derivative along the spatial direction by which it is ensured that the smoothness is guided by edges in the image.
4.2 pose estimation loss
The pose estimation loss is used for representing the estimated absolute pose in a six-dimensional vector form, and the six-dimensional pose vector consists of a three-dimensional vector representing the position and a three-dimensional vector representing the posture; true pose vector to be providedAnd fitting the estimated pose vector, and calculating the error between the two as a loss function of pose estimation:
wherein the parametersAnd represents a scale factor to balance the difference between the displacement error and the rotation error.
4.3 loss of geometric consistency
Loss of geometric consistency, enhancement of geometric consistency of predicted results, and requirement of depth images of two frames at adjacent momentsAndthe method conforms to the same scene architecture and minimizes the difference between the two; the geometric consistency between sample images of the same training batch can be improved, and the geometric consistency of the whole image sequence is realized through the transitivity of the sample images, for example, the depth images of It and It +1 in the same training batch are kept consistent, while the depth images of It +1 and It +2 are consistent in another training batch, so that the consistency of the depth images of It and It +2 can be ensured even though not necessarily in the same training batchThe consistency of the depth images of the whole image sequence is realized; in the training process, the pose estimation network and the depth estimation network are naturally coupled, and a prediction result with consistent scale can be generated in the whole image sequence; according to the constraint, the inconsistency of the depth images of the adjacent frames is calculated, and for any pixel point P in the depth image, the depth image difference of the adjacent framesThe formula is defined as follows:
wherein,the depth image corresponding to the image frame at the t +1 moment calculated by the depth estimation neural network is shown,the expression is that the depth estimation neural network carries out depth image estimation on the image frame at the t momentAnd estimating the pose transformation matrix from the current time to the next time output by the neural network according to the self-motionTo pairThe depth image after being transformed, i.e.
Because the camera is in continuous motion, the acquired image scene is continuously changed, and the inconsistency of calculation is ensured by cutting the depth imageValidity of pixel points, each pixel point calculated correspondinglySumming to standardize the calculation difference of the depth images; during optimization, points with different absolute depths are equally processed, so that the calculation of absolute distances is more visual than that of absolute distances; the function is a symmetrical function, and the value range of the function is between 0 and 1, so that the stability of the training value is ensured; from the inconsistency map described above, the proposed geometric consistency penalty is defined as follows:
wherein V represents all pixel points after performing matrix transformation calculation and clipping on the depth image,representing the number of pixel points in V; the formula algorithm guarantees scale consistency between adjacent image pairs by minimizing the geometric distance of the predicted depth, and propagates the consistency into the whole image sequence through training. The self-motion estimation network and the depth estimation network are closely linked, and the self-motion estimation network can finally predict tracks with consistent scale in the global range.
Advantageous effects
The invention discloses a visual odometer method based on image depth estimation, and provides an algorithm idea for realizing scale consistency constraint by combining a depth image and a monocular image aiming at a common scale fuzzy problem in a monocular visual odometer. In the network structure design, a long-time memory unit and a short-time memory unit are fused into a convolution neural unit, monocular images are used for training a depth estimation network, and luminosity consistency loss is introduced into a loss function, and smoothness loss calculation is added to capture more image characteristics and generate more accurate depth images. And then, the estimated depth image and the original monocular image are combined to realize scale consistency constraint, the pose estimation network is trained, the depth estimation network and the pose estimation network are respectively subjected to experiment and result analysis, and the result shows that the visual odometer combined with the depth image estimation can solve the scale fuzzy problem in the monocular visual odometer to a certain extent.
Drawings
FIG. 1 is a diagram of a visual odometry network architecture incorporating depth image estimation.
Fig. 2 is a structural design diagram of a depth estimation network.
Fig. 3 is a parameter setting diagram of the depth estimation network.
Fig. 4 is a pose estimation network architecture diagram.
Fig. 5 is a parameter setting diagram of the pose estimation network.
FIG. 6 is a graph of test results under the Eigen split data set.
Fig. 7 is a graph of test results under the KITTI Odometry data set.
Fig. 8 is a track reconstruction result diagram of the sequence 01 pose estimation network model in each sequence.
Fig. 9 is a track reconstruction result diagram of the sequence 05 pose estimation network model in each sequence.
Fig. 10 is a track reconstruction result diagram of the sequence 09 pose estimation network model in each sequence.
Detailed Description
1. Introduction to data set
The invention analyzes and evaluates the frame performance provided by the display experiment result, and compares the frame performance with the prior work for depth estimation and pose estimation of the visual odometer. The system mainly trains on a KITTI original data set (raw data), wherein the acquisition frequency of the data set is 10Hz, the data set comprises an original binocular color and gray level image sequence (which is not synchronized and corrected) and a binocular color and gray level image sequence after synchronization and correction, 3D point cloud map information (about 10 ten thousand points are corresponding to each frame and stored in a binary floating point matrix form), 3D GPS/IMU data information (txt files storing positioning information, speed, acceleration and meta information), related camera calibration information and label information of a 3D object. There were 61 video sequences in the entire data set. When the experiment of the monocular depth image estimation network is carried out, an Eigen data set segmentation method and a KITTI Odometry data set segmentation method are referred to at the same time. When the performance of the visual odometer part is evaluated, the experiment is based on a KITTIOdometer data set, and meanwhile, the images in the data set and the depth images generated by the depth estimation network are combined to carry out network model training. And it is noted that there are overlapping portions between the Eigen and odometric data sets, the two segmentation methods are described below.
Data set segmentation method
In Eigen et al, a total of 697 frame images from a sequence of 28 images were selected as a test set for monocular depth estimation. The other 33 scene sequences, 23488 frames of binocular image pairs as training sets, and the binocular images are respectively used as images acquired by two monocular cameras. Since image reprojection loss is caused by parallax at the time of motion, all still frame images with a motion s less than 0.3 meters from the baseline are discarded during the data preparation phase.
There are 11 image sequences in the Odometry dataset that contain true values of camera pose. When the pose estimation network is evaluated, a 00-08 (03 is not included) image sequence in the data set is used as a training data set, and a 09-10 image sequence is used as a data set for test evaluation.
3. Results and analysis of the experiments
Fig. 6 shows the results of the depth estimation images output after inputting different images, respectively, and it can be seen that the models can output more accurate depth estimation results in different scenes. In order to show the higher robustness of the model, two images of the same object in the same scene under different illumination conditions are respectively input in fig. 7 (the vehicle circled in the figure is under the conditions of sunlight irradiation and tree shadow shielding in the two images respectively), and the corresponding output results show that the model can still accurately detect the object in the image under different illumination conditions.
When the performance of the depth estimation neural network is evaluated and compared with other existing methods, two test set segmentation methods, namely Eigen segmentation and KITTI segmentation, are used simultaneously. The comparison index is divided into an error index part and a precision index part, the error index part comprises an absolute relative error Abs Rel, a square relative error Sq Rel, a root mean square error RMSE and a root mean square logarithmic error RMSE log, and the smaller the error value is, the better the performance is; the accuracy index portion includes the larger the value the better the performance. In the KITTI segmentation method, the test data set contains a total of 200 images acquired from 28 different scenes, each image having corresponding true value data. Fig. 6 and fig. 7 show the test results of the depth estimation network under the Eigen segmentation data set and the kttiodometry data set, respectively, and compare with the existing method.
After the depth estimation network training is carried out, the pose estimation network, namely the visual odometer part, is subjected to model training by combining the output depth image and the original RGB image. Fig. 8-10 show the final trajectory reconstruction effect, the results of which indicate that the visual odometer combined with depth image estimation can solve the scale blur problem in monocular visual odometers to some extent.
It is to be noted that, in the present invention, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (7)
1. A visual odometry method based on image depth estimation is characterized in that: the network architecture comprises a neural network of a monocular depth image and a monocular RGB image, the depth image estimation network and the self-movement estimation network are trained by using a monocular image frame sequence, the scale consistency constraint is realized, and the whole process comprises the following steps:
s1, image from given two consecutive framesRespectively estimating the depth by using a depth estimation network to obtain corresponding depth images;
S2, converting the original imageWith corresponding depth image estimationCollectively as inputs to an auto-motion estimation network, and outputs a pose prediction for the camera at time t;
S3, converting the predicted pose into a pose transformation matrix of 4 × 4Is calculated according to the transformation matrixDepth image of next frameBy calculatingAnd the consistency between the pose and the pose is lost, and the model training is carried out to improve the scale consistency of the pose prediction.
2. The visual odometry method based on image depth estimation according to claim 1, characterized in that: the depth estimation network uses a self-coding-decoding U-shaped network architecture, a cyclic neural unit and an encoder unit are fused to form a long-time memory unit which is used as a self-coding part of the network, and space information and time information are utilized simultaneously; the space-time characteristics calculated by the encoder are input into a decoder network for accurate depth image estimation and reconstruction, and the decoder part fuses low-level characteristic representations from different levels of the encoder by using a jump connection method.
3. The visual odometry method based on image depth estimation according to claim 1, characterized in that: the neural network for pose estimation uses a VGG16 convolutional neural network architecture and is designed by fusing a cyclic neural unit, and the visual odometry network is characterized in that: 1) the input of the visual odometer comprises the depth image information of the current frame, so that the scale consistency of a scene between the depth and the pose is ensured; 2) the input adopted by the visual odometer is the joint representation of the image frame and the depth image corresponding to a single time point, and the information of the previous frame is stored in the hidden layer; 3) the visual odometry network is able to maintain the same scene scale when run over the entire image sequence.
4. The visual odometry method based on image depth estimation according to claim 1, characterized in that: computing predicted depth imagesAnd a loss of photometric consistency between the known depth image data, performing supervised training on the depth estimation neural network; the luminosity loss provides less information in a low-texture environment, and smoothness loss calculation is also added during depth estimation;in the part of the visual odometer, pose information estimated by a network and truth value information provided in a data set are used for calculating pose estimation loss, so that supervision training of a pose estimation network is realized; introducing geometric consistency loss, performing torsion transformation on the depth image estimated in the previous frame according to the pose transformation matrix, and calculating the difference between the depth image estimated in the previous frame and the depth image estimated in the next frame; the overall objective loss function is calculated as follows:
where the loss of photometric consistency and the loss of smoothness are represented respectively,representing pose estimation loss and representing geometric consistency loss; in order to balance the scale and size of each loss calculation result, a corresponding weight parameter is added for the calculation of the loss of each category, and a parameter is also added to control the smoothness degree of the depth image.
5. The visual odometry method based on image depth estimation according to claim 4, characterized in that: luminosity consistency loss and smoothness loss, brightness consistency and space smooth prior used in a dense correlation algorithm, luminosity difference calculation is carried out on the estimated depth image and the really acquired depth image information and is used as a loss function of network training, and a calculation formula of the luminosity consistency loss function is expressed as follows:
wherein,expressing the number of pixel points in the image, and expressing the set of all the pixel points in the image by V; the L1 norm loss function is selected in the calculation of the loss function; l1 norm lossThe loss function, also called the minimum absolute deviation or minimum absolute error, is calculated as the sum of the absolute values of the differences between the estimated value and the target value and minimized; compared with the L2 norm loss function for calculating the sum of the squares of the differences, the calculation method of the L1 loss function has better robustness in processing abnormal values, and the L1 norm loss in the photometric consistency difference can be calculated according to the following formula:
the luminance loss is less in the information quantity provided when the scene is uniformly distributed and the texture is less, more information is generated by calculating multiple differences, and the calculation of smoothness loss is introduced, so that the network can more sensitively sense the edge information in the image, and the accuracy of the output result in a low-texture environment is ensured; the smoothness loss is calculated as follows:
6. The visual odometry method based on image depth estimation according to claim 4, characterized in that: the pose estimation loss is used for representing the estimated absolute pose in a six-dimensional vector form, and the six-dimensional pose vector consists of a three-dimensional vector representing the position and a three-dimensional vector representing the posture; true pose vector to be providedAnd fitting the estimated pose vector, and calculating the error between the two as a loss function of pose estimation:
7. The visual odometry method based on image depth estimation according to claim 4, characterized in that: loss of geometric consistency, enhancement of geometric consistency of predicted results, and requirement of depth images of two frames at adjacent momentsAndthe method conforms to the same scene architecture and minimizes the difference between the two; geometric consistency between sample images of the same training batch can be improved, and the geometric consistency of the whole image sequence is realized through the transitivity of the sample images, for example, the depth images of It and It +1 in the same training batch are kept consistent, while the depth images of It +1 and It +2 are consistent in another training batch, so that the consistency of the depth images of It and It +2 can be ensured even though not necessarily in the same training batch, and the consistency of the depth images of the whole image sequence is realized; in the training process, the pose estimation network and the depth estimation network are naturally coupled, and a prediction result with consistent scale can be generated in the whole image sequence; according to the constraint, the inconsistency of the depth images of the adjacent frames is calculated, and for any pixel point P in the depth image, the depth image difference of the adjacent framesThe formula is defined as follows:
wherein,the depth image corresponding to the image frame at the t +1 moment calculated by the depth estimation neural network is shown,the expression is that the depth estimation neural network carries out depth image estimation on the image frame at the t momentAnd estimating the pose transformation matrix from the current time to the next time output by the neural network according to the self-motionTo pairThe depth image after being transformed, i.e.
Because the camera is in continuous motion, the acquired image scene is in continuous change, the effectiveness of calculating inconsistent pixel points is ensured by cutting the depth image, and the depth image difference of adjacent frames calculated by each pixel point is corresponding toSumming to standardize the calculation difference of the depth images; during optimization, points with different absolute depths are equally processed, so that the calculation of absolute distances is more visual than that of absolute distances; the function is a symmetrical function, and the value range of the function is between 0 and 1, so that the stability of the training value is ensured; according to the above-mentioned inconsistency map, proposed geometry oneSexual loss is defined as follows:
wherein V represents all pixel points after performing matrix transformation calculation and clipping on the depth image,representing the number of pixel points in V; the formula algorithm guarantees scale consistency between adjacent image pairs by minimizing the geometric distance of the predicted depth, and propagates the consistency into the whole image sequence through training.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010478460.0A CN111369608A (en) | 2020-05-29 | 2020-05-29 | Visual odometer method based on image depth estimation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010478460.0A CN111369608A (en) | 2020-05-29 | 2020-05-29 | Visual odometer method based on image depth estimation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111369608A true CN111369608A (en) | 2020-07-03 |
Family
ID=71211134
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010478460.0A Pending CN111369608A (en) | 2020-05-29 | 2020-05-29 | Visual odometer method based on image depth estimation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111369608A (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111899280A (en) * | 2020-07-13 | 2020-11-06 | 哈尔滨工程大学 | Monocular vision odometer method adopting deep learning and mixed pose estimation |
CN112052626A (en) * | 2020-08-14 | 2020-12-08 | 杭州未名信科科技有限公司 | Automatic neural network design system and method |
CN112102399A (en) * | 2020-09-11 | 2020-12-18 | 成都理工大学 | Visual mileage calculation method based on generative antagonistic network |
CN112150531A (en) * | 2020-09-29 | 2020-12-29 | 西北工业大学 | Robust self-supervised learning single-frame image depth estimation method |
CN112184611A (en) * | 2020-11-03 | 2021-01-05 | 支付宝(杭州)信息技术有限公司 | Image generation model training method and device |
CN112308918A (en) * | 2020-10-26 | 2021-02-02 | 杭州电子科技大学 | Unsupervised monocular vision odometer method based on pose decoupling estimation |
CN112348843A (en) * | 2020-10-29 | 2021-02-09 | 北京嘀嘀无限科技发展有限公司 | Method and device for adjusting depth image prediction model and electronic equipment |
CN112561978A (en) * | 2020-12-18 | 2021-03-26 | 北京百度网讯科技有限公司 | Training method of depth estimation network, depth estimation method of image and equipment |
CN112819853A (en) * | 2021-02-01 | 2021-05-18 | 太原理工大学 | Semantic prior-based visual odometer method |
CN113012191A (en) * | 2021-03-11 | 2021-06-22 | 中国科学技术大学 | Laser mileage calculation method based on point cloud multi-view projection graph |
CN113160294A (en) * | 2021-03-31 | 2021-07-23 | 中国科学院深圳先进技术研究院 | Image scene depth estimation method and device, terminal equipment and storage medium |
CN113538335A (en) * | 2021-06-09 | 2021-10-22 | 香港中文大学深圳研究院 | In-vivo relative positioning method and device of wireless capsule endoscope |
CN113570658A (en) * | 2021-06-10 | 2021-10-29 | 西安电子科技大学 | Monocular video depth estimation method based on depth convolutional network |
CN114463420A (en) * | 2022-01-29 | 2022-05-10 | 北京工业大学 | Visual mileage calculation method based on attention convolution neural network |
CN114526728A (en) * | 2022-01-14 | 2022-05-24 | 浙江大学 | Monocular vision inertial navigation positioning method based on self-supervision deep learning |
CN114663509A (en) * | 2022-03-23 | 2022-06-24 | 北京科技大学 | Self-supervision monocular vision odometer method guided by key point thermodynamic diagram |
CN114998411A (en) * | 2022-04-29 | 2022-09-02 | 中国科学院上海微系统与信息技术研究所 | Self-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss |
WO2023109221A1 (en) * | 2021-12-14 | 2023-06-22 | 北京地平线信息技术有限公司 | Method and apparatus for determining homography matrix, medium, device, and program product |
WO2023165093A1 (en) * | 2022-03-01 | 2023-09-07 | 上海商汤智能科技有限公司 | Training method for visual inertial odometer model, posture estimation method and apparatuses, electronic device, computer-readable storage medium, and program product |
CN117197229A (en) * | 2023-09-22 | 2023-12-08 | 北京科技大学顺德创新学院 | Multi-stage estimation monocular vision odometer method based on brightness alignment |
WO2024012405A1 (en) * | 2022-07-11 | 2024-01-18 | 华为技术有限公司 | Calibration method and apparatus |
CN117456531A (en) * | 2023-12-25 | 2024-01-26 | 乐山职业技术学院 | Multi-view pure rotation anomaly identification and automatic mark training method, equipment and medium |
-
2020
- 2020-05-29 CN CN202010478460.0A patent/CN111369608A/en active Pending
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111899280A (en) * | 2020-07-13 | 2020-11-06 | 哈尔滨工程大学 | Monocular vision odometer method adopting deep learning and mixed pose estimation |
CN111899280B (en) * | 2020-07-13 | 2023-07-25 | 哈尔滨工程大学 | Monocular vision odometer method adopting deep learning and mixed pose estimation |
CN112052626A (en) * | 2020-08-14 | 2020-12-08 | 杭州未名信科科技有限公司 | Automatic neural network design system and method |
CN112052626B (en) * | 2020-08-14 | 2024-01-19 | 杭州未名信科科技有限公司 | Automatic design system and method for neural network |
CN112102399A (en) * | 2020-09-11 | 2020-12-18 | 成都理工大学 | Visual mileage calculation method based on generative antagonistic network |
CN112102399B (en) * | 2020-09-11 | 2022-07-19 | 成都理工大学 | Visual mileage calculation method based on generative antagonistic network |
CN112150531A (en) * | 2020-09-29 | 2020-12-29 | 西北工业大学 | Robust self-supervised learning single-frame image depth estimation method |
CN112308918B (en) * | 2020-10-26 | 2024-03-29 | 杭州电子科技大学 | Non-supervision monocular vision odometer method based on pose decoupling estimation |
CN112308918A (en) * | 2020-10-26 | 2021-02-02 | 杭州电子科技大学 | Unsupervised monocular vision odometer method based on pose decoupling estimation |
CN112348843A (en) * | 2020-10-29 | 2021-02-09 | 北京嘀嘀无限科技发展有限公司 | Method and device for adjusting depth image prediction model and electronic equipment |
CN112184611A (en) * | 2020-11-03 | 2021-01-05 | 支付宝(杭州)信息技术有限公司 | Image generation model training method and device |
CN112561978A (en) * | 2020-12-18 | 2021-03-26 | 北京百度网讯科技有限公司 | Training method of depth estimation network, depth estimation method of image and equipment |
CN112561978B (en) * | 2020-12-18 | 2023-11-17 | 北京百度网讯科技有限公司 | Training method of depth estimation network, depth estimation method of image and equipment |
CN112819853A (en) * | 2021-02-01 | 2021-05-18 | 太原理工大学 | Semantic prior-based visual odometer method |
CN112819853B (en) * | 2021-02-01 | 2023-07-25 | 太原理工大学 | Visual odometer method based on semantic priori |
CN113012191B (en) * | 2021-03-11 | 2022-09-02 | 中国科学技术大学 | Laser mileage calculation method based on point cloud multi-view projection graph |
CN113012191A (en) * | 2021-03-11 | 2021-06-22 | 中国科学技术大学 | Laser mileage calculation method based on point cloud multi-view projection graph |
CN113160294A (en) * | 2021-03-31 | 2021-07-23 | 中国科学院深圳先进技术研究院 | Image scene depth estimation method and device, terminal equipment and storage medium |
CN113538335A (en) * | 2021-06-09 | 2021-10-22 | 香港中文大学深圳研究院 | In-vivo relative positioning method and device of wireless capsule endoscope |
CN113570658A (en) * | 2021-06-10 | 2021-10-29 | 西安电子科技大学 | Monocular video depth estimation method based on depth convolutional network |
WO2023109221A1 (en) * | 2021-12-14 | 2023-06-22 | 北京地平线信息技术有限公司 | Method and apparatus for determining homography matrix, medium, device, and program product |
CN114526728A (en) * | 2022-01-14 | 2022-05-24 | 浙江大学 | Monocular vision inertial navigation positioning method based on self-supervision deep learning |
CN114526728B (en) * | 2022-01-14 | 2023-12-05 | 浙江大学 | Monocular vision inertial navigation positioning method based on self-supervision deep learning |
CN114463420A (en) * | 2022-01-29 | 2022-05-10 | 北京工业大学 | Visual mileage calculation method based on attention convolution neural network |
WO2023165093A1 (en) * | 2022-03-01 | 2023-09-07 | 上海商汤智能科技有限公司 | Training method for visual inertial odometer model, posture estimation method and apparatuses, electronic device, computer-readable storage medium, and program product |
CN114663509A (en) * | 2022-03-23 | 2022-06-24 | 北京科技大学 | Self-supervision monocular vision odometer method guided by key point thermodynamic diagram |
CN114998411A (en) * | 2022-04-29 | 2022-09-02 | 中国科学院上海微系统与信息技术研究所 | Self-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss |
CN114998411B (en) * | 2022-04-29 | 2024-01-09 | 中国科学院上海微系统与信息技术研究所 | Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss |
WO2024012405A1 (en) * | 2022-07-11 | 2024-01-18 | 华为技术有限公司 | Calibration method and apparatus |
CN117197229A (en) * | 2023-09-22 | 2023-12-08 | 北京科技大学顺德创新学院 | Multi-stage estimation monocular vision odometer method based on brightness alignment |
CN117197229B (en) * | 2023-09-22 | 2024-04-19 | 北京科技大学顺德创新学院 | Multi-stage estimation monocular vision odometer method based on brightness alignment |
CN117456531A (en) * | 2023-12-25 | 2024-01-26 | 乐山职业技术学院 | Multi-view pure rotation anomaly identification and automatic mark training method, equipment and medium |
CN117456531B (en) * | 2023-12-25 | 2024-03-19 | 乐山职业技术学院 | Multi-view pure rotation anomaly identification and automatic mark training method, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111369608A (en) | Visual odometer method based on image depth estimation | |
Shamwell et al. | Unsupervised deep visual-inertial odometry with online error correction for RGB-D imagery | |
CN111311666B (en) | Monocular vision odometer method integrating edge features and deep learning | |
CN108961327B (en) | Monocular depth estimation method and device, equipment and storage medium thereof | |
CN107945204B (en) | Pixel-level image matting method based on generation countermeasure network | |
US20210142095A1 (en) | Image disparity estimation | |
US10636151B2 (en) | Method for estimating the speed of movement of a camera | |
CN110689562A (en) | Trajectory loop detection optimization method based on generation of countermeasure network | |
Shamwell et al. | Vision-aided absolute trajectory estimation using an unsupervised deep network with online error correction | |
CN113272713B (en) | System and method for performing self-improved visual odometry | |
US11082633B2 (en) | Method of estimating the speed of displacement of a camera | |
CN110675418A (en) | Target track optimization method based on DS evidence theory | |
CN112233179B (en) | Visual odometer measuring method | |
CN110009674A (en) | Monocular image depth of field real-time computing technique based on unsupervised deep learning | |
CN113963240A (en) | Comprehensive detection method for multi-source remote sensing image fusion target | |
Chen et al. | A stereo visual-inertial SLAM approach for indoor mobile robots in unknown environments without occlusions | |
CN111998862A (en) | Dense binocular SLAM method based on BNN | |
CN108986150A (en) | A kind of image light stream estimation method and system based on non-rigid dense matching | |
CN117274515A (en) | Visual SLAM method and system based on ORB and NeRF mapping | |
CN112184767A (en) | Method, device, equipment and storage medium for tracking moving object track | |
Liu et al. | Real-time dense construction with deep multi-view stereo using camera and IMU sensors | |
CN117367427A (en) | Multi-mode slam method applicable to vision-assisted laser fusion IMU in indoor environment | |
Yuan et al. | RGB-D DSO: Direct sparse odometry with RGB-D cameras for indoor scenes | |
Pirvu et al. | Depth distillation: unsupervised metric depth estimation for UAVs by finding consensus between kinematics, optical flow and deep learning | |
CN113673313B (en) | Gesture recognition method based on hierarchical convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200703 |