CN114663496A - Monocular vision odometer method based on Kalman pose estimation network - Google Patents
Monocular vision odometer method based on Kalman pose estimation network Download PDFInfo
- Publication number
- CN114663496A CN114663496A CN202210290482.3A CN202210290482A CN114663496A CN 114663496 A CN114663496 A CN 114663496A CN 202210290482 A CN202210290482 A CN 202210290482A CN 114663496 A CN114663496 A CN 114663496A
- Authority
- CN
- China
- Prior art keywords
- pose
- estimation network
- network
- loss function
- depth
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 238000012549 training Methods 0.000 claims abstract description 43
- 230000009466 transformation Effects 0.000 claims abstract description 43
- 238000001914 filtration Methods 0.000 claims abstract description 10
- 230000004927 fusion Effects 0.000 claims description 20
- 230000000007 visual effect Effects 0.000 claims description 18
- 238000005259 measurement Methods 0.000 claims description 17
- 238000009499 grossing Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 8
- 238000006073 displacement reaction Methods 0.000 claims description 5
- BULVZWIRKLYCBC-UHFFFAOYSA-N phorate Chemical compound CCOP(=S)(OCC)SCSCC BULVZWIRKLYCBC-UHFFFAOYSA-N 0.000 claims description 3
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims 1
- 230000004907 flux Effects 0.000 claims 1
- 230000006870 function Effects 0.000 description 53
- 230000004913 activation Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 230000003068 static effect Effects 0.000 description 4
- 238000002679 ablation Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/277—Analysis of motion involving stochastic approaches, e.g. using Kalman filters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a monocular vision odometer method based on a Kalman pose estimation network, and belongs to the technical field of computer vision. The method comprises the following steps: constructing a depth estimation network and a pose estimation network based on Kalman filtering; calculating a photometric error loss function of a video image sequence based on motion weighting according to the pose transformation between each pair of adjacent frame images output by the pose estimation network and the depth image of the input frame output by the depth estimation network; introducing a variation automatic encoder structure into the constructed pose estimation network and the depth estimation network, and calculating a variation automatic encoder loss function; based on the obtained luminosity error loss function and the variational automatic encoder loss function, a training strategy aiming at the frame missing condition is adopted to train a pose estimation network and a depth estimation network; and estimating the camera pose corresponding to each frame of image by using the trained pose estimation network. By adopting the method and the device, the accuracy of the camera pose estimation can be improved and the frame missing condition can be adapted.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a monocular vision odometer method based on a Kalman pose estimation network.
Background
The visual odometer is used as a part of a simultaneous positioning and mapping technology and is widely applied to the fields of robot navigation, automatic driving, augmented reality, wearable computing and the like. The visual odometer is a method for estimating the current position and posture of a camera according to an input video image frame. The visual odometer can be classified into a monocular visual odometer, a binocular visual odometer, a visual odometer with inertial information fused, and the like, according to the type and number of the sensors. The monocular vision odometer has the advantages of only needing one camera, low requirement on hardware, no need of correction and the like.
The traditional visual odometry method firstly extracts and matches image features, and then estimates the relative pose between two adjacent frames according to the geometric relationship. The method achieves good results in practical application, is the mainstream method of the current visual odometer, and has the problem that the computing performance and the robustness are difficult to balance.
Monocular visual odometry based on deep learning can be divided into supervised and self-supervised methods. The self-supervision method only needs to input video image frames, does not need to collect real poses, does not depend on additional equipment, and is wider in applicability compared with the supervision method.
The existing many self-monitoring methods do not consider the association between frames, and the information between frames is not fully utilized, so that the trained network is difficult to estimate a more accurate pose, and the method can not adapt to the condition of frame missing. In addition, the moving object in the scene is inconsistent with the Euclidean transformation of the scene, and does not meet the assumption of a static scene, so that the motion of the scene is difficult to be described by one Euclidean transformation, and the estimation result of the network has deviation.
Disclosure of Invention
The embodiment of the invention provides a monocular vision odometer method based on a Kalman pose estimation network, which can improve the accuracy of camera pose estimation and adapt to the condition of frame loss. The technical scheme is as follows:
the embodiment of the invention provides a monocular vision odometer method based on a Kalman pose estimation network, which comprises the following steps:
constructing a depth estimation network and a pose estimation network based on Kalman filtering; the system comprises a pose estimation network and a depth estimation network, wherein the pose estimation network is used for outputting pose transformation between each pair of input adjacent frame images;
calculating a photometric error loss function of a video image sequence based on motion weighting according to the output pose transformation between each pair of adjacent frame images and the depth image of the input frame;
introducing a variation automatic encoder structure into the constructed pose estimation network and the depth estimation network, and calculating a loss function of the variation automatic encoder;
based on the obtained luminosity error loss function and the variational automatic encoder loss function, a training strategy aiming at the frame missing condition is adopted to train a pose estimation network and a depth estimation network;
and estimating the camera pose corresponding to each frame of image in the video image sequence of the pose to be estimated by using the trained pose estimation network.
Further, the pose estimation network includes: the system comprises a pose measurement network, a pose weighted fusion network, a pose updating network and a pose prediction network; wherein,
input adjacent frame image I through pose measurement networkt-1And ItCoding is carried out to obtain a pose measurement vector C at the time tmeasure,t:
Cmeasure,t=Measure(It-1,It)
Wherein, It-1And ItImages respectively representing the time t-1 and the time t, and Measure () is the pose measurement network;
measuring pose by vector Cmeasure,tAnd pose prediction vector Cpred,tInputting the pose weighted fusion vector C into the pose weighted fusion network to obtain the pose weighted fusion vector C at the time tfuse,t:
Cfuse,t=(1-Wt)*Cmeasure,t+Wt*Cpred,t
Wherein, WtOutput of [0, 1 ] for the last full link layer in the pose weighted fusion network]Weight in between; cpred,tIn the adjacent frame image It-2、It-1When inputting the pose estimation network, the pose prediction vector at the t moment output by the pose prediction network, Cpred,t=Predict(Cfuse,t-1),Cfuse,t-1Weighting and fusing the pose at the time t-1, and using Predict as the pose prediction network;
fusing pose weighting vector Cfuse,tInput pose update network estimation pose transformation Tt→t-1:
Tt→t-1=Update(Cfuse,t)
Wherein Update () is the pose Update network; t ist→t-1Represents from It-1To ItThe 6 degree of freedom relative pose vector of (1), comprising: relative rotation and relative displacement.
Furthermore, both the pose estimation network and the depth estimation network adopt encoder-decoder structures.
Further, the calculating a photometric error loss function based on motion weighting for a video image sequence according to the output pose transformation between each pair of adjacent frame images and the input frame depth image comprises:
multiplying the pose transformation between each pair of adjacent frame images output by the pose estimation network to obtain the pose transformation in a longer time period, and calculating the photometric error between the images based on the motion weighting based on the obtained pose transformation in the longer time period;
and calculating a photometric error loss function based on motion weighting of the video image sequence according to the photometric error obtained by calculation.
Further, the multiplying the pose transformation between each pair of adjacent frame images output by the pose estimation network to obtain a pose transformation of a longer time period, and based on the obtained pose transformation of the longer time period, calculating the photometric error between the images based on the motion weighting comprises:
for a video image sequence with length N, the corresponding time is t0,t1,...,tN-1Accumulating and multiplying the poses between each pair of adjacent frame images output by the pose estimation network to obtain pose transformation in a longer period of timeWherein,is from time tjTo time tiPose transformation between images; n is the length of each batch of video image sequences of the input pose estimation network and the depth estimation network;
for imagesA point ofIts three-dimensional coordinates are represented by its depth imageReduction; in the imageUpper corresponding projected pointExpressed as:
Using the resulting motion weighting term WmwCalculating an imageAndmotion-weighted photometric error between:
wherein,representing imagesAndbased on the motion-weighted photometric error between,representing an original imageAnd reconstructing the imageStructural similarity between them, α0、α1、α2For the hyper-parameter controlling the proportion of the parts, the symbol denotes the product between the pixels, | · |1Represents a 1-norm, | · |2Representing a2 norm.
Further, the obtained motion weighting term W is utilizedmwCalculating an imageAndbefore the photometric error based on motion weighting, the method further comprises:
the pixel involved in the photometric error calculation is determined and labeled as mask:
wherein,is tiThe time of the original image is determined,is tjThe time of the original image is determined,is from tiOriginal image of timeT obtained by samplingjTime of day imageIs reconstructed image, | · |*Representing a photometric error, i.e., a 1-norm or a 2-norm;
in order to calculate the imageAndbased on motion weighted photometric errors, only mask-marked pixels are used for the calculation.
Further, the photometric error loss function is represented as:
wherein L ispA photometric error loss function is represented.
Further, the variational autoencoder loss function is represented as:
wherein L isVAERepresenting a variational autocoder loss function, xd、xpAll represent an input image, λ1、λ2All represent a hyper-parameter; p is a radical ofη(c) Is a prior distribution, c is the independent variable of the distribution; q. q.sd(cd|xd) Coding of networks for depth estimation cdThe sampled distribution of; q. q.sp(cp|xp) Coding of networks for depth estimation cpIs the KL divergence, KL (q)d(cd|xd)||pη(c) Is q representsd(cd|xd) For pη(c) KL divergence of (i), KL (q)p(cp|xp)||pη(c) Is q representsp(cp|xp) For pη(c) The KL divergence of (a),to c is todAnd cpRespectively inputting the outputs obtained by the decoders of the depth estimation network and the pose estimation network, and further generating a reconstructed imageThe probability distribution of (a) is determined,representing a mathematical expectation, cd~qd(cd|xd) Denotes cdObey qd(cd|xd),cp~qp(cp|xp) Denotes cpObey qp(cp|xp),Is shown in satisfying cd~qd(cd|xd) And cp~qp(cp|xp) Under the conditions of (a) under (b),a mathematical expectation of (d); c. Cd~qd(cd|xd) Denotes cdObey qd(cd|xd) Distributing; c. Cp~qp(cp|xp) Denotes cpObey qp(cp|xp) And (4) distribution.
Further, the training strategy adopted for the frame missing condition based on the obtained photometric error loss function and the variational automatic encoder loss function to train the pose estimation network and the depth estimation network comprises:
for the output of the depth estimation network, a depth smoothing loss function is computed:
wherein,is parallax with the depth image DtIn an inverse proportional relationship with respect to each other,denotes the partial derivatives in the x-and y-directions, ItIs the image at the time t;
determining a final loss function L based on the obtained depth smoothing loss function, photometric error loss function and variational automatic encoder loss function:
L=Lp+λLs+LVAE
wherein λ is a hyper-parameter controlling the depth smoothing loss function ratio, LpRepresenting a photometric error loss function, LVAERepresenting a variational autocoder loss function;
and training a pose estimation network and a depth estimation network by adopting a training strategy aiming at the frame missing condition by using the obtained final loss function.
Further, the training the pose estimation network and the depth estimation network by adopting the training strategy aiming at the frame missing condition comprises:
inputting all images in a batch of video image sequences into a pose estimation network and a depth estimation network, and training the pose estimation network and the depth estimation network;
inputting all images in a batch of video image sequences into a depth estimation network, setting zero of one or more frames of images in the batch of video image sequences, inputting the images into a pose estimation network, and training the pose estimation network and the depth estimation network.
The monocular vision odometer method based on the Kalman pose estimation network, provided by the embodiment of the invention, at least has the following advantages:
(1) aiming at the problems that the correlation between frames is not considered in many existing self-monitoring methods, and the information between the frames is not fully utilized, so that the trained network is difficult to estimate a more accurate pose and can not adapt to the frame missing condition, the embodiment constructs a pose estimation network based on Kalman filtering, and designs a training strategy aiming at the frame missing condition on the basis of the pose estimation network, so that the pose estimation network can estimate the current pose by utilizing the information between the frames and is more suitable for the frame missing condition;
(2) aiming at the problems that an Euclidean transformation of a moving object possibly existing in a scene is inconsistent with that of the scene, the assumption of a static scene is not satisfied, and the motion of the scene is difficult to be described by one Euclidean transformation, so that the estimation result of a pose estimation network is deviated.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a monocular vision odometry method based on a Kalman pose estimation network according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a pose estimation network according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a work flow of a monocular vision odometry method based on a Kalman pose estimation network according to an embodiment of the present invention;
fig. 4 is a schematic diagram of the trajectories estimated by the method provided by the embodiment of the present invention on sequences 09 and 10 in the KITTI odometry dataset.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present invention provides a monocular vision odometry method based on a kalman pose estimation network, including:
s101, constructing a depth estimation network (DepthNet) and a pose estimation network (KF-PoseNet) based on Kalman filtering; the system comprises a pose estimation network and a depth estimation network, wherein the pose estimation network is used for outputting pose transformation between each pair of input adjacent frame images;
as shown in fig. 2, the pose estimation network includes: the system comprises a pose measurement network, a pose weighted fusion network, a pose updating network and a pose prediction network; wherein, as shown in Table 1,
the pose measurement network comprises a ResNet50 layer, three convolutional layers and a global averaging pooling layer; the first two layers of the three convolutional layers take ReLU (Rectification Linear Unit) as an activation function, and the last layer of the convolutional layers is a pure convolutional layer without an activation function; the input of the pose measurement network passes through ResNet50, then sequentially passes through three layers of convolutional layers, and finally is output through a full-play average pooling layer; the pose measurement network uses the ResNet50 structure as an encoder;
the pose weighted fusion network comprises 4 full connection layers and a weighted fusion layer; the first three layers of the 4 full connection layers use ReLU as an activation function, and the last layer of the 4 full connection layers use a Sigmoid function as an activation function; cmeasure,tAnd Cpred,tAfter the first full connection layer is input, the first full connection layer sequentially passes through the last three full connection layers, and a weight coefficient with a value range of 0-1 is output; the weight coefficient is further related to Cmeasure,tAnd Cpred,tSending the mixture into a weighted fusion layer;
the pose updating network comprises 4 fully-connected layers, and the first three fully-connected layers use ReLU as an activation function; the 4 full-connection layers are connected in sequence;
similar to the pose updating network, the pose prediction network also comprises 4 fully-connected layers, and the 4 fully-connected layers are connected in sequence.
TABLE 1 KF-PoseNet network architecture
In this embodiment, the working process of the pose estimation network is as follows:
input adjacent frame image I through pose measurement networkt-1And ItCoding is carried out to obtain a pose measurement vector C at the time tmeasure,t:
Cmeasure,t=Measure(It-1,It)
Wherein, It-1And ItImages respectively representing the t-1 moment and the t moment, and Measure () is the pose measurement network; it should be noted that Cmeasure,tNot a 6 degree of freedom pose vector, but only the image pair (I)t-1,It) BitA coded vector of pose information;
measuring pose by vector Cmeasure,tAnd pose prediction vector Cpred,tInputting the pose weighted fusion vector C into the pose weighted fusion network to obtain the pose weighted fusion vector C at the time tfuse,t:
Cfuse,t=(1-Wt)*Cmeasure,t+Wt*Cpred,t
Wherein, Wt=Weight(Cmeasure,t,Cpred,t) Output of [0, 1 ] for the last full link layer in the pose weighted fusion network]Weight between the pose and the pose, Weight is 4 full connection layers in the pose weighted fusion network; cpred,tIn the adjacent frame image It-2、It-1When inputting the pose estimation network, the pose prediction vector at the t moment output by the pose prediction network, Cpred,t=Predict(Cfuse,t-1),Cfuse,t-1Weighting and fusing the pose at the time t-1, and using Predict as the pose prediction network;
fusing pose weighting vector Cfuse,tInputting pose updating network estimation final pose transformation Tt→t-1:
Tt→t-1=Update(Cfuse,t)
Wherein Update () is the pose Update network; t is a unit oft→t-1Represents from It-1To ItRelative pose vector of 6 degrees of freedom.
As shown in FIG. 3, the input of KF-PoseNet is two adjacent frames of images, the output is a 6-DOF relative pose vector, the first three elements of which represent 3-DOF relative rotation R, and the last three elements of which represent 3-DOF relative displacement t.
In this embodiment, both the pose estimation network and the depth estimation network adopt encoder-decoder structures, an encoder in the pose estimation network is a ResNet50 structure in the pose measurement network, and a decoder of the pose estimation network is a rest structure, a pose weighting fusion network, a pose prediction network and a pose update network except ResNet50 in the pose measurement network.
In this embodiment, the depth estimation network (DepthNet) also selects the ResNet50 structure as an encoder, uses a multilayer deconvolution structure similar to a DispNet decoder as a decoder, and is connected to the encoder through a skip link structure, and the output layer activation function is Sigmoid. In this embodiment, the input of DepthNet is a single frame image, and the output is normalized parallax D. To obtain the depth D, the reciprocal D of the obtained parallax needs to be 1/(aD + b), where a and b are parameters for limiting the output value range, and the output depth is between 0.1 and 100.
In this embodiment, in order to control the memory usage and keep the details as much as possible, the input RGB images of the pose estimation network and the depth estimation network are scaled to 832 × 256.
In this embodiment, the pair of adjacent frame images is set as the image I at the current time ttPicture I at the last instant t-1t-1. Adjacent frame image ItAnd It-1Inputting the pose estimation network and the depth estimation network to obtain pose transformation T between the adjacent frame imagest→t-1And the depth image Dt of each input frame.
S102, calculating a luminosity error loss function of a video image sequence based on motion weighting according to the pose transformation between each pair of output adjacent frame images and the depth image of an input frame; the method specifically comprises the following steps:
a1, multiplying the pose transformation between each pair of adjacent frame images output by the pose estimation network to obtain the pose transformation in a long time period, and calculating the photometric error between the images based on the motion weighting based on the obtained pose transformation in the long time period;
in this embodiment, there may be some fast moving objects in a scene. These objects are not consistent with the euclidean transforms of the camera. It is obviously not reasonable to treat the pixels corresponding to these objects equally when training the network. For the case that the motion amplitude in the data set is not large and the illumination change is not obvious, the brightness of the pixel at the same position in two adjacent frames does not change too much. Based on this, in order to reduce the influence of fast moving objects, the present invention designs photometric errors based on motion weighting. In order to enable the network to consider consistency of pose transformation in a long time, the embodiment calculates photometric errors constrained by long-time poses by using continuous multi-frame images when calculating photometric errors based on motion weighting, specifically:
for a video image sequence with length N, the corresponding time is t0,t1,...,tN-1Accumulating and multiplying the poses between each pair of adjacent frame images output by the pose estimation network to obtain pose transformation in a longer period of time
Wherein,is from time tjTo time tiPose transformation between images; n is the length of each batch of video image sequences of the input pose estimation network and the depth estimation network;
then, for the imageA point ofWhose three-dimensional coordinates may be represented by its depth imageReduction; then it is in the imageUpper corresponding projection pointCan be calculated by the following formula:
the above formula ignores the calculation of part of the homogeneous coordinate system;
Finally, the obtained motion weighting term W is utilizedmwComputing imagesAndphotometric error based on motion weighting
Wherein,representing an original imageAnd reconstructing the imageStructural similarity between them, α0、α1、α2For the hyper-parameter controlling the proportion of the parts, the symbol denotes the product between the pixels, | · |1Represents a 1-norm, | · |2Representing a2 norm.
In this embodiment, the motion weighting term W described above is usedmwAnd weighting the calculated breadth error pixel by pixel to obtain the luminosity error weighted by the motion.
Further, it is considered that when an object that is stationary with respect to the camera exists in the field of view, the accuracy of the depth estimation may be affected, resulting in the estimated depth becoming infinite. For this purpose, a method of automatically marking still pixels is also used in this embodiment and removed from the training process. Specifically, pixels having errors smaller than the reconstruction error between the current image and the reference image are regarded as pixels stationary with respect to the camera, and the depth network is trained using only pixels having reconstruction errors smaller than the errors between the current image and the reference image (i.e., pixels involved in photometric error calculation).
In this embodiment, the pixels involved in the photometric error calculation are determined and marked as mask:
wherein,is tiThe time of the original image is determined,is tjThe time of the original image is determined,is from tiOriginal image of timeT obtained by samplingjTime of day imageIs reconstructed from the image, | · |*Representing a photometric error, i.e., a 1-norm or a 2-norm;
in order to calculate the imageAndwhen the luminosity error is based on the motion weighting, only the pixels marked by the mask are used for calculation, and then the pixels marked by the mask are used for network training.
A2, calculating a photometric error loss function L of video image sequence motion weighting according to the photometric error obtained by calculationp:
Wherein L isp' represents the photometric error of the motion weighting.
S103, introducing a variation automatic encoder structure into the constructed pose estimation network and the constructed depth estimation network, and calculating a loss function of the variation automatic encoder;
in this embodiment, KF-PoseNet and DepthNet both use encoder-decoder structures; in order to improve the robustness of the output of a decoder to noise in the coding of the input of the decoder and improve the generalization capability of a network, a variable Auto-Encoder (VAE) structure is introduced into KF-PoseNet and DepthNet;
take a depth estimation network as an example;
encoder of depth estimation network inputs image xd=ItMapping to coding space to obtain mean vector Ed(xd);
Further, let q bed(cd|xd) For codes to be input to a decoder cdIs set as the mean value of the mean value E of the input imagedThe covariance being the covariance Σ of the input imagedGaussian distribution ofAt qd(cd|xd) Random sampling in the distribution to obtain code cdWherein c isdObey qd(cd|xd) Distribution of using cd~qd(cd|xd) Represents;
further, c is encodeddAn input decoder obtains a depth image of an input image;
in order to meet the requirement of deep network back propagation, in this embodiment, when the code is randomly sampled in the coding space, the following reparameterization method is adopted to change the random sampling process into a micromanipulation: let η be Gaussian distribution obeying zero mean unit covarianceRandom vector of (2):where I is the identity matrix, then pair cd~qd(cd|xd) Can be passed through cd=Ed(xd)+∑dEta implementation, wheredIs the covariance of the input image;
the pose estimation network is the same;
further, a VAE loss function L is calculatedVAEComprises the following steps:
wherein x isd、xpAll representing the input image, over-parameter lambda1、λ2Weight, p, for controlling the target itemη(c) Is a prior distribution, c is the independent variable of the distribution; q. q.sd(cd|xd) Coding of networks for depth estimation cdIs sampled over a period of time qp(cp|xp) Coding of networks for depth estimation cpIs the KL divergence, KL (q)d(cd|xd)||pη(c) Is q representsd(cd|xd) For pη(c) KL divergence of (i), KL (q)p(cp|xp)||pη(c) Is q representsp(cp|xp) For pη(c) The KL divergence of (a) is,to c isdAnd cpRespectively inputting the outputs obtained by the decoders of the depth estimation network and the pose estimation network, and further generating a reconstructed imageThe probability distribution of (a) is determined,representing a mathematical expectation, cd~qd(cd|xd) Denotes cdObey qd(cd|xd),cp~qp(cp|xp) Denotes cpObey qp(cp|xp),Is shown in satisfying cd~qd(cd|xd) And cp~qp(cp|xp) Under the conditions of (a) under (b),the mathematical expectation of (c); the first two items in the formula control the tendency that the distribution of KL divergence punishment hidden codes deviates from prior distribution; the last term, minimizing a non-negative log-likelihood term, is equivalent to minimizing a photometric error loss function; thus, the VAE loss function is actually only the first two terms in the formula.
S104, training a pose estimation network and a depth estimation network by adopting a training strategy aiming at the frame missing condition based on the obtained luminosity error loss function and the variation automatic encoder loss function; the method specifically comprises the following steps:
first, considering a texture-stable plane in three-dimensional space, its depth in the depth image tends not to vary too drastically. Therefore, in the present embodiment, for the output of the depth estimation network, the depth smoothing loss function L is also calculated as followss:
Wherein,to look atDifference and depth image DtIn an inverse proportional relationship with respect to each other,representing partial derivatives, I, in the x-and y-directions, respectivelytIs the image at time t;
in this embodiment, the depth smoothing loss function is calculated for each frame of image in each batch;
then, based on the obtained depth smoothing loss function, photometric error loss function and variational automatic encoder loss function, determining a final loss function L:
L=Lp+λLs+LVAE
wherein λ is a hyper-parameter controlling the depth smoothing loss function ratio, LpRepresenting a photometric error loss function, LVAERepresenting a variational autocoder loss function;
and finally, training a pose estimation network and a depth estimation network by adopting a training strategy aiming at the frame loss condition by using the obtained final loss function.
And S105, estimating the camera pose corresponding to each frame of image in the video image sequence of the pose to be estimated by using the trained pose estimation network.
In the embodiment, the pose estimation network (KF-PoseNet) based on Kalman filtering refers to the idea of Kalman filtering during design, and the multiple estimations are associated in time sequence, so that the KF-PoseNet in the invention can better adapt to the frame loss condition;
in the embodiment, during training, all images in a batch of video image sequences are input into the pose estimation network and the depth estimation network, and the pose estimation network and the depth estimation network are trained; further, aiming at the possible frame missing condition existing in the visual odometer, all images in a batch of video image sequences are input into the depth estimation network, one or more frames of images in the batch of video image sequences are input into the pose estimation network after being set to zero, and the pose estimation network and the depth estimation network are trained. For example, when N is 5, a batch simultaneously inputs 5 consecutive frames of images to the depth estimation network, and respectively inputs every two adjacent frames to the pose estimation network; further, aiming at the possible frame missing condition existing in the visual odometer, two frames of images are randomly set to zero from the last 3 frames of the five continuous frames input at one time, and then the images are input into the pose estimation network for training, while the input of the depth estimation network is still a complete image.
And after the training is finished, estimating the camera pose corresponding to each frame of image in the video image sequence of the pose to be estimated by using the trained pose estimation network.
The monocular vision odometer based on the Kalman pose estimation network can effectively estimate the camera pose corresponding to each frame according to the input image sequence and adapt to the frame missing condition. The invention is suitable for the self-supervision monocular vision mileometer.
The monocular vision odometer method based on the Kalman pose estimation network, provided by the embodiment of the invention, at least has the following advantages:
(1) aiming at the problems that the correlation between frames is not considered in many existing self-monitoring methods, and the information between the frames is not fully utilized, so that the trained network is difficult to estimate a more accurate pose and can not adapt to the frame missing condition, the embodiment constructs a pose estimation network based on Kalman filtering, and designs a training strategy aiming at the frame missing condition on the basis of the pose estimation network, so that the pose estimation network can estimate the current pose by utilizing the information between the frames and is more suitable for the frame missing condition;
(2) aiming at the problems that an Euclidean transformation of a moving object possibly existing in a scene is inconsistent with that of the scene, the assumption of a static scene is not satisfied, and the motion of the scene is difficult to be described by using one Euclidean transformation, so that the estimation result of a pose estimation network is deviated.
In order to verify the effectiveness of the monocular vision odometry method based on the Kalman pose estimation network provided by the embodiment of the invention, the performance of the method is tested by using an evaluation index provided in a KITTI odometry data set:
(1) relative displacement mean square error (rel.): the average displacement rmse (root Mean Square error) of all subsequences of a sequence of length 100, 200, … …, 800 meters, measured in% i.e. meters per 100 meters deviation, is as good as the smaller the value.
(2) Relative rotation mean square error (rel.): the average rotation RMSE, measured in deg/m, of all subsequences of 100, 200, … …, 800 meters length in a sequence is as small as possible.
In the embodiment, eight sequences 00-07 in a KITTI odometer data set are used as a training set and a verification set to train a pose estimation network and a depth estimation network, and two sequences 09-10 are used for testing the performance of the pose estimation network based on Kalman filtering for the self-supervision monocular vision odometer.
The KITTI odometer data set is a binocular image, radar points and actual tracks of the road environment in the city, which are acquired by equipment such as a vehicle-mounted camera.
In the implementation process, a depth estimation network and a pose estimation network based on Kalman filtering are constructed; the system comprises a pose estimation network and a depth estimation network, wherein the pose estimation network is used for outputting pose transformation between each pair of input adjacent frame images; calculating a photometric error loss function of a video image sequence based on motion weighting according to the output pose transformation between each pair of adjacent frame images and the depth image of the input frame; introducing a variation automatic encoder structure into the constructed pose estimation network and the depth estimation network, and calculating a loss function of the variation automatic encoder; based on the obtained luminosity error loss function and the variational automatic encoder loss function, a training strategy aiming at the frame missing condition is adopted to train a pose estimation network and a depth estimation network; and estimating the camera pose corresponding to each frame of image in the video image sequence of the pose to be estimated by using the trained pose estimation network.
In this embodiment, the parameter α of the hyperparametric of the photometric error loss function0=0.85,α1=0.1,α20.05, the parameter λ of the depth smoothing loss function is 10-3Parameter of VAE loss function λ1=λ20.01. In the training process of the network, the initial learning rate is 10-4And gradually reducing along with the training, wherein the learning rate is 0.97 times of that of the previous round after each round of iteration, and performing 45 iterations by adopting an Adam optimizer, wherein the batch size of each round of iteration is 2, and each batch contains 3 continuous frames of images.
In order to verify the performance of the method of the present invention, in this example, a monocular visual odometry method based on self-supervision of deep learning in recent years was selected for comparison, and the experimental results are shown in table 2. The generated trajectory in this embodiment is shown in fig. 4, where the dashed trajectory is a real trajectory, and the solid trajectory is the estimated trajectory in this embodiment.
As can be seen from table 2, the method described in this embodiment achieves better performance compared to other methods due to better utilization of information extracted from past time instants, weighting of motion pixels, and application of VAE structures.
TABLE 2 comparison of the method of this example with other methods
In order to verify the significance of the parts of the method described in this example, ablation experiments were also performed in this example. The experimental result is shown in table 3, where "without kalman structure" in the second row indicates that the kalman structure in the network is removed, the decoder structure of the pose estimation network is a four-layer convolutional layer, the activation function of the first three-layer convolutional layer is ReLU, and the output of the fourth layer is subjected to global averaging pooling to obtain a pose vector with 6 degrees of freedom. The third row to the fifth row respectively correspond to experimental results for removing motion weighting, a VAE structure and long-term consistency constraint in the network. The "# fc ═ 6" and "# fc ═ 2" in the sixth and seventh rows respectively represent experimental results of the pose estimation network decoder portion using fully-connected layers of different numbers of layers. The first row "basic" represents the experimental results without the addition of the above three structures. The last row represents experimental results of the complete method herein.
The experimental result shows that the structure similar to the Kalman structure enables the network to obtain reference from previous data when estimating the current adjacent frame, so that the current estimation result is more accurate; due to the introduction of motion weighting, the network can pay more attention to the pixels of static objects in the environment during training, and the interference of objects inconsistent with the Euclidean transformation of the camera is weakened; due to the introduction of the VAE structure, a decoder of the network has more robustness to noise in a result of an encoder, the generalization capability of the network is improved, and the result is further improved. Finally, the complete method herein achieves better experimental results. The performance of our method gradually increased with each part, and the significance of each part in our method is proved.
TABLE 3 ablation test results
Table 4 experimental results for the case of frame missing
The embodiment also performs an ablation experiment on the training strategy for the frame missing condition designed in the invention. During testing, the present embodiment adopts the way of setting the image of one frame to zero at the 50 th and 150 … … th frames and setting the image of two frames to zero at the 100 th and 200 … … th frames, so as to test the invention under the condition of frame missing. The test results are shown in table 4. The first row "without frame training" represents a result of training without using a training method for a frame missing condition in the present embodiment, the second row "without kalman structure" represents an experimental result of training without using a training method for a frame missing condition, and the third row represents an experimental result of training with a training method for a frame missing condition in the present embodiment. As can be seen from table 4, the method proposed in this embodiment can be well adapted to the frame missing situation.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (10)
1. A monocular vision odometry method based on a Kalman pose estimation network is characterized by comprising the following steps:
constructing a depth estimation network and a pose estimation network based on Kalman filtering; the system comprises a pose estimation network and a depth estimation network, wherein the pose estimation network is used for outputting pose transformation between each pair of input adjacent frame images;
calculating a photometric error loss function of a video image sequence based on motion weighting according to the output pose transformation between each pair of adjacent frame images and the depth image of the input frame;
introducing a variation automatic encoder structure into the constructed pose estimation network and the depth estimation network, and calculating a variation automatic encoder loss function;
based on the obtained luminosity error loss function and the variational automatic encoder loss function, a training strategy aiming at the frame missing condition is adopted to train a pose estimation network and a depth estimation network;
and estimating the camera pose corresponding to each frame of image in the video image sequence of the pose to be estimated by using the trained pose estimation network.
2. The monocular visual odometry method based on a kalman pose estimation network of claim 1, wherein the pose estimation network comprises: the system comprises a pose measurement network, a pose weighted fusion network, a pose updating network and a pose prediction network; wherein,
input adjacent frame image I through pose measurement networkt-1And ItCoding is carried out to obtain a pose measurement vector C at the time tmeasure,t:
Cmeasure,t=Measure(It-1,It)
Wherein, It-1And ItImages respectively representing the t-1 moment and the t moment, and Measure () is the pose measurement network;
measuring pose by vector Cmeasure,tAnd pose prediction vector Cpred,tInputting the pose weighted fusion vector C into the pose weighted fusion network to obtain the pose weighted fusion vector C at the time tfuse,t:
Cfuse,t=(1-Wt)*Cmeasure,t+Wt*Cpred,t
Wherein, WtOutput of [0, 1 ] for the last full link layer in the pose weighted fusion network]Weight in between; cpred,tFor the adjacent frame image It-2、It-1When inputting the pose estimation network, the pose prediction vector at the t moment output by the pose prediction network, Cpred,t=Predict(Cfuse,t-1),Cfuse,t-1Weighting and fusing the pose at the time t-1, and using Predict as the pose prediction network;
fusing pose weighting vector Cfuse,tInput pose update network estimation pose transformation Tt→t-1:
Tt→t-1=Update(Cfuse,t)
Wherein Update () is the pose Update network; t ist→t-1Represents from It-1To ItThe 6 degree of freedom relative pose vector of (1), comprising: relative rotation and relative displacement.
3. The monocular visual odometry method based on a kalman pose estimation network of claim 2, wherein the pose estimation network and the depth estimation network both employ an encoder-decoder structure.
4. The Kalman pose estimation network based monocular visual odometry method of claim 1, wherein the computing a motion-weighted based photometric error loss function for a video image sequence based on the pose transformation between each pair of output adjacent frame images and the depth image of the input frame comprises:
multiplying the pose transformation between each pair of adjacent frame images output by the pose estimation network to obtain the pose transformation in a longer time period, and calculating the photometric error between the images based on the motion weighting based on the obtained pose transformation in the longer time period;
and calculating a photometric error loss function based on motion weighting of the video image sequence according to the calculated photometric error.
5. The monocular vision odometry method based on the kalman pose estimation network of claim 4, wherein the multiplying the pose transformation between each pair of adjacent frame images output by the pose estimation network results in a pose transformation of a longer period of time, and the calculating the photometric error between the images based on the motion weighting based on the resulting pose transformation of a longer period of time comprises:
for a video image sequence with length N, the corresponding time is t0,t1,...,tN-1Accumulating and multiplying the poses between each pair of adjacent frame images output by the pose estimation network to obtain pose transformation in a longer period of timeWherein,is from time tjTo time tiThe pose between the images is changed; n is the length of each batch of video image sequences of the input pose estimation network and the depth estimation network;
for imagesA point ofIts three-dimensional coordinates are represented by its depth imageReduction; in the imageUpper corresponding projected pointExpressed as:
Using the resulting motion-weighted term WmwComputing imagesAndmotion-weighted photometric error between:
wherein,representing imagesAndbased on the motion-weighted photometric error between,representing an original imageAnd reconstructing the imageStructural similarity between them, α0、α1、α2To control the hyper-parameters of the proportion of the parts, the symbol denotes the product between pixels, | · survival1Represents 1 norm, | ·| non-conducting phosphor2Representing a2 norm.
6. The Kalman pose estimation network based monocular vision odometry method of claim 5, characterized in that the derived motion weighting term W is utilizedmwCalculating an imageAndbefore the motion-weighted based photometric error, the method further comprises:
the pixel involved in the photometric error calculation is determined and marked as mask:
wherein,is tiThe time of the original image is determined,is tjThe time of the original image is determined,is from tiOriginal image of timeT obtained by samplingjTime of day imageThe reconstructed image, | · | | luminous flux*Representing a photometric error, i.e., a 1-norm or a 2-norm;
8. The Kalman pose estimation network based monocular visual odometry method of claim 1, characterized in that the variational autoencoder loss function is expressed as:
wherein L isVAERepresenting a variational autocoder loss function, xd、xpAll represent an input image, λ1、λ2All represent a hyper-parameter; p η (c) is the prior distribution, c is the independent variable of the distribution; q. q.sd(cd|xd) Coding of networks for depth estimation cdThe sampled distribution of; q. q.sp(cp|xp) Coding of networks for depth estimation cpIs the KL divergence, KL (q)d(cd|xd)||pη(c) Is q representsd(cd|xd) For pη(c) KL divergence of (Q)p(cp|xp)||pη(c) Is q representsp(cp|xp) For pη(c) The KL divergence of (a),to c isdAnd cpRespectively inputting the output obtained by the decoders of the depth estimation network and the pose estimation network, and further generating a reconstructed imageThe probability distribution of (a) is determined,representing a mathematical expectation, cd~qd(cd|xd) Denotes cdObey qd(cd|xd),cp~qp(cp|xp) Denotes cpObey qp(cp|xp),Is shown in satisfying cd~qd(cd|xd) And cp~gp(cp|xp) Under the conditions of (a) under (b),a mathematical expectation of (d); c. Cd~qd(cd|xd) Denotes cdObey qd(cd|xd) Distributing; c. Cp~qp(cp|xp) Denotes cpComplianceqp(cp|xp) And (4) distribution.
9. The monocular visual odometry method based on a kalman pose estimation network of claim 1, wherein the training of the pose estimation network and the depth estimation network with the training strategy for the frame missing condition based on the obtained photometric error loss function and the variational automatic encoder loss function comprises:
for the output of the depth estimation network, a depth smoothing loss function is computed:
wherein,is parallax, is inversely proportional to the depth image Dt,representing partial derivatives, I, in the x-and y-directions, respectivelytIs the image at time t;
determining a final loss function L based on the obtained depth smoothing loss function, photometric error loss function and variational automatic encoder loss function:
L=Lp+λLs+LVAE
wherein λ is a hyper-parameter controlling the depth smoothing loss function ratio, LpRepresenting a photometric error loss function, LVAERepresenting a variational autocoder loss function;
and training a pose estimation network and a depth estimation network by adopting a training strategy aiming at the frame missing condition by using the obtained final loss function.
10. The Kalman pose estimation network based monocular visual odometry method of claim 1, wherein the training the pose estimation network and the depth estimation network with a training strategy for frame loss comprises:
inputting all images in a batch of video image sequence into a pose estimation network and a depth estimation network, and training the pose estimation network and the depth estimation network;
inputting all images in a batch of video image sequences into a depth estimation network, setting zero of one or more frames of images in the batch of video image sequences, inputting the images into a pose estimation network, and training the pose estimation network and the depth estimation network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210290482.3A CN114663496B (en) | 2022-03-23 | 2022-03-23 | Monocular vision odometer method based on Kalman pose estimation network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210290482.3A CN114663496B (en) | 2022-03-23 | 2022-03-23 | Monocular vision odometer method based on Kalman pose estimation network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114663496A true CN114663496A (en) | 2022-06-24 |
CN114663496B CN114663496B (en) | 2022-10-18 |
Family
ID=82031748
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210290482.3A Active CN114663496B (en) | 2022-03-23 | 2022-03-23 | Monocular vision odometer method based on Kalman pose estimation network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114663496B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115131404A (en) * | 2022-07-01 | 2022-09-30 | 上海人工智能创新中心 | Monocular 3D detection method based on motion estimation depth |
CN115841151A (en) * | 2023-02-22 | 2023-03-24 | 禾多科技(北京)有限公司 | Model training method and device, electronic equipment and computer readable medium |
CN116612182A (en) * | 2023-07-19 | 2023-08-18 | 煤炭科学研究总院有限公司 | Monocular pose estimation method and monocular pose estimation device |
CN117197229A (en) * | 2023-09-22 | 2023-12-08 | 北京科技大学顺德创新学院 | Multi-stage estimation monocular vision odometer method based on brightness alignment |
CN117214860A (en) * | 2023-08-14 | 2023-12-12 | 北京科技大学顺德创新学院 | Laser radar odometer method based on twin feature pyramid and ground segmentation |
CN117974721A (en) * | 2024-04-01 | 2024-05-03 | 合肥工业大学 | Vehicle motion estimation method and system based on monocular continuous frame images |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150124882A1 (en) * | 2013-11-05 | 2015-05-07 | Arris Enterprises, Inc. | Bit depth variable for high precision data in weighted prediction syntax and semantics |
CN108665496A (en) * | 2018-03-21 | 2018-10-16 | 浙江大学 | A kind of semanteme end to end based on deep learning is instant to be positioned and builds drawing method |
CN110490928A (en) * | 2019-07-05 | 2019-11-22 | 天津大学 | A kind of camera Attitude estimation method based on deep neural network |
US20200041276A1 (en) * | 2018-08-03 | 2020-02-06 | Ford Global Technologies, Llc | End-To-End Deep Generative Model For Simultaneous Localization And Mapping |
CN110910447A (en) * | 2019-10-31 | 2020-03-24 | 北京工业大学 | Visual odometer method based on dynamic and static scene separation |
CN112102399A (en) * | 2020-09-11 | 2020-12-18 | 成都理工大学 | Visual mileage calculation method based on generative antagonistic network |
CN113108771A (en) * | 2021-03-05 | 2021-07-13 | 华南理工大学 | Movement pose estimation method based on closed-loop direct sparse visual odometer |
CN113483762A (en) * | 2021-07-05 | 2021-10-08 | 河南理工大学 | Pose optimization method and device |
US20220036577A1 (en) * | 2020-07-30 | 2022-02-03 | Apical Limited | Estimating camera pose |
CN114022527A (en) * | 2021-10-20 | 2022-02-08 | 华中科技大学 | Monocular endoscope depth and pose estimation method and device based on unsupervised learning |
-
2022
- 2022-03-23 CN CN202210290482.3A patent/CN114663496B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150124882A1 (en) * | 2013-11-05 | 2015-05-07 | Arris Enterprises, Inc. | Bit depth variable for high precision data in weighted prediction syntax and semantics |
CN108665496A (en) * | 2018-03-21 | 2018-10-16 | 浙江大学 | A kind of semanteme end to end based on deep learning is instant to be positioned and builds drawing method |
US20200041276A1 (en) * | 2018-08-03 | 2020-02-06 | Ford Global Technologies, Llc | End-To-End Deep Generative Model For Simultaneous Localization And Mapping |
CN110490928A (en) * | 2019-07-05 | 2019-11-22 | 天津大学 | A kind of camera Attitude estimation method based on deep neural network |
CN110910447A (en) * | 2019-10-31 | 2020-03-24 | 北京工业大学 | Visual odometer method based on dynamic and static scene separation |
US20220036577A1 (en) * | 2020-07-30 | 2022-02-03 | Apical Limited | Estimating camera pose |
CN112102399A (en) * | 2020-09-11 | 2020-12-18 | 成都理工大学 | Visual mileage calculation method based on generative antagonistic network |
CN113108771A (en) * | 2021-03-05 | 2021-07-13 | 华南理工大学 | Movement pose estimation method based on closed-loop direct sparse visual odometer |
CN113483762A (en) * | 2021-07-05 | 2021-10-08 | 河南理工大学 | Pose optimization method and device |
CN114022527A (en) * | 2021-10-20 | 2022-02-08 | 华中科技大学 | Monocular endoscope depth and pose estimation method and device based on unsupervised learning |
Non-Patent Citations (6)
Title |
---|
CHUNHUI ZHAO ET.AL: "Pose estimation for multi-camera systems", 《2017 IEEE INTERNATIONAL CONFERENCE ON UNMANNED SYSTEMS (ICUS)》 * |
UGUR KAYASAL: "《磁力计辅助的惯性导航系统 基于IMU和磁力计的导航系统建模和仿真》", 28 February 2017 * |
YAN WANG ET.AL: "Unsupervised Learning of Accurate Camera Pose and Depth From Video Sequences With Kalman Filter", 《IEEE ACCESS》 * |
周凯等: "动态环境下融合边缘信息的稠密视觉里程计算法", 《哈尔滨工业大学学报》 * |
孟庆鑫等: "《机器人技术基础》", 30 September 2006 * |
张玮奇: "基于学习的单目同步定位与地图构建方法研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115131404A (en) * | 2022-07-01 | 2022-09-30 | 上海人工智能创新中心 | Monocular 3D detection method based on motion estimation depth |
CN115131404B (en) * | 2022-07-01 | 2024-06-14 | 上海人工智能创新中心 | Monocular 3D detection method based on motion estimation depth |
CN115841151A (en) * | 2023-02-22 | 2023-03-24 | 禾多科技(北京)有限公司 | Model training method and device, electronic equipment and computer readable medium |
CN116612182A (en) * | 2023-07-19 | 2023-08-18 | 煤炭科学研究总院有限公司 | Monocular pose estimation method and monocular pose estimation device |
CN116612182B (en) * | 2023-07-19 | 2023-09-29 | 煤炭科学研究总院有限公司 | Monocular pose estimation method and monocular pose estimation device |
CN117214860A (en) * | 2023-08-14 | 2023-12-12 | 北京科技大学顺德创新学院 | Laser radar odometer method based on twin feature pyramid and ground segmentation |
CN117214860B (en) * | 2023-08-14 | 2024-04-19 | 北京科技大学顺德创新学院 | Laser radar odometer method based on twin feature pyramid and ground segmentation |
CN117197229A (en) * | 2023-09-22 | 2023-12-08 | 北京科技大学顺德创新学院 | Multi-stage estimation monocular vision odometer method based on brightness alignment |
CN117197229B (en) * | 2023-09-22 | 2024-04-19 | 北京科技大学顺德创新学院 | Multi-stage estimation monocular vision odometer method based on brightness alignment |
CN117974721A (en) * | 2024-04-01 | 2024-05-03 | 合肥工业大学 | Vehicle motion estimation method and system based on monocular continuous frame images |
Also Published As
Publication number | Publication date |
---|---|
CN114663496B (en) | 2022-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114663496B (en) | Monocular vision odometer method based on Kalman pose estimation network | |
CN114782691B (en) | Robot target identification and motion detection method based on deep learning, storage medium and equipment | |
CN109271933B (en) | Method for estimating three-dimensional human body posture based on video stream | |
CN107424177B (en) | Positioning correction long-range tracking method based on continuous correlation filter | |
Varma et al. | Transformers in self-supervised monocular depth estimation with unknown camera intrinsics | |
CN110490928A (en) | A kind of camera Attitude estimation method based on deep neural network | |
CN112233179B (en) | Visual odometer measuring method | |
CN114663509B (en) | Self-supervision monocular vision odometer method guided by key point thermodynamic diagram | |
CN110610486B (en) | Monocular image depth estimation method and device | |
CN113256698B (en) | Monocular 3D reconstruction method with depth prediction | |
CN111325784A (en) | Unsupervised pose and depth calculation method and system | |
CN110942484B (en) | Camera self-motion estimation method based on occlusion perception and feature pyramid matching | |
CN110428461B (en) | Monocular SLAM method and device combined with deep learning | |
CN114612545A (en) | Image analysis method and training method, device, equipment and medium of related model | |
CN111275751B (en) | Unsupervised absolute scale calculation method and system | |
CN115482252A (en) | Motion constraint-based SLAM closed loop detection and pose graph optimization method | |
Li et al. | Unsupervised joint learning of depth, optical flow, ego-motion from video | |
Fan et al. | Random epipolar constraint loss functions for supervised optical flow estimation | |
Liu et al. | Joint estimation of pose, depth, and optical flow with a competition–cooperation transformer network | |
CN114485417B (en) | Structural vibration displacement identification method and system | |
CN115830707A (en) | Multi-view human behavior identification method based on hypergraph learning | |
KR20200095251A (en) | Apparatus and method for estimating optical flow and disparity via cycle consistency | |
Jiang et al. | EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency | |
CN117197229B (en) | Multi-stage estimation monocular vision odometer method based on brightness alignment | |
KR20060065417A (en) | Marker-free motion capture apparatus and method for correcting tracking error |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |