CN114399547A - Monocular SLAM robust initialization method based on multiple frames - Google Patents

Monocular SLAM robust initialization method based on multiple frames Download PDF

Info

Publication number
CN114399547A
CN114399547A CN202111499604.1A CN202111499604A CN114399547A CN 114399547 A CN114399547 A CN 114399547A CN 202111499604 A CN202111499604 A CN 202111499604A CN 114399547 A CN114399547 A CN 114399547A
Authority
CN
China
Prior art keywords
representing
view
points
feature points
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111499604.1A
Other languages
Chinese (zh)
Other versions
CN114399547B (en
Inventor
胡德文
葛杨冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202111499604.1A priority Critical patent/CN114399547B/en
Publication of CN114399547A publication Critical patent/CN114399547A/en
Application granted granted Critical
Publication of CN114399547B publication Critical patent/CN114399547B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/60Rotation of a whole image or part thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Abstract

The invention discloses a monocular SLAM robust initialization method based on multiple frames, which comprises the following steps: extracting characteristic points of image frames in the initial video stream for mutual matching, screening out matching points, and obtaining initial matching point pairs; screening out a three-view pair according to the initial matching point pair, further screening out matching points in the three-view pair based on a random sampling consistency algorithm of the trifocal tensor, and constructing a three-frame matching image; solving the relative rotation between each image frame according to a double-view geometric principle; solving global rotation according to the relative rotation between the image frames; solving global displacement based on global rotation; the global rotation and the global displacement are integrated to obtain the initial pose of each frame, and nonlinear optimization adjustment is carried out according to the initial pose; calculating the depth of field of the feature points and recovering three-dimensional coordinates of the feature points; the method can improve the convergence speed and reduce the appearance of scattered points, thereby improving the precision of the initial map.

Description

Monocular SLAM robust initialization method based on multiple frames
Technical Field
The invention belongs to the technical field of monocular SLAM initialization, and particularly relates to a monocular SLAM robust initialization method based on multiple frames.
Background
The goal of simultaneous localization and mapping (SLAM) is to reconstruct the unknown environment while estimating the motion trajectory of the camera. The technology is widely applied to the fields of augmented reality, automatic driving and the like at present, and can run in real time under the background independent of external infrastructure.
Initialization is the key of monocular SLAM technology, and by initialization, the initial posture of a camera can be obtained and an initial map can be generated, so that support is provided for the subsequent tracking stage.
At present, the academic community mainly uses a Structure From Motion (SFM) similar to incremental Motion to initialize a monocular SLAM system, and mainly uses epipolar geometric constraint or planar structural constraint between two frames to construct a basic matrix or a homography matrix, and obtains an initial posture of a camera by decomposing the basic matrix or the homography matrix, and obtains an initial map by using a triangulation technology. The technology has high requirements on the matching of the initial posture and the feature points of the camera, the initialization process depends on the initial movement of the camera, and the convergence cannot be realized quickly. In the later stage of initialization, Bundle Adjustment (BA) is performed to further optimize the initial pose and the initial map, but there still exist "scatter points" after optimization, and the distance between these "scatter points" and their nearest three-dimensional feature points is far beyond the average distance between common three-dimensional feature points, which will cause errors in the tracking process, especially in the case of poor image observation quality, will affect the subsequent pose tracking.
Disclosure of Invention
The invention aims to overcome the defects that the prior art cannot be quickly converged and scattered points exist, and provides an SLAM initialization method which can improve the convergence speed and reduce the scattered points, so that the accuracy of an initial map is improved, in particular to a monocular SLAM robust initialization method based on multiple frames.
The invention provides a monocular SLAM robust initialization method based on multiple frames, which comprises the following steps:
s1: extracting feature points of each image frame in the initial video stream, matching the image frames pairwise according to the feature points, screening out matching points, and obtaining initial matching point pairs, wherein the matching points are matched feature points;
s2: screening three-view pairs, namely three image frames with enough common-view characteristic points according to the initial matching point pairs, further screening matching points in the three-view pairs based on a random sampling consistency algorithm of a trifocal tensor, and constructing three-frame matching graphs, namely topological graphs describing the common-view relation among the image frames;
s3: solving the relative rotation between each image frame according to a double-view geometric principle;
s4: solving global rotation by adopting an iterative weighted least square method according to the relative rotation between the image frames;
s5: solving the global displacement based on the global rotation of each image frame and the linear constraint relation between the scene structure and the global displacement;
s6: the global rotation and the global displacement are integrated to obtain the initial pose of each frame, and on the basis, the pose of each frame is optimized by using a pose-only nonlinear optimization adjustment strategy;
s7: and calculating the depth of field of the feature points and recovering the three-dimensional coordinates of the feature points.
Preferably, in S1, feature points of each image frame in the initial video stream are extracted, the image frames are matched with each other pairwise according to the feature points, matching points are screened by a random sampling consensus algorithm, and an initial matching point pair is obtained, where the matching points are matched feature points.
Preferably, the specific steps of screening out the matching points in the three-view pair are as follows:
s2.1: setting a sample set P with the minimum number of samples n, extracting n samples from the sample set P to form a sample subset S, and calculating an initial trifocal tensor by using an intrinsic matrix between views to serve as an initialization model M;
s2.2: obtaining projection matrixes P1, P2 and P3 of the trifocal tensor according to the initial trifocal tensor, calculating coordinates of the characteristic points by using a least square method, and respectively obtaining three estimated values of the characteristic points through the projection matrixes P1, P2 and P3, wherein the three estimated values are as follows:
Figure RE-GDA0003497279450000021
wherein the content of the first and second substances,
Figure RE-GDA0003497279450000022
representing the estimated values of the feature points under the action of the P1 projection matrix,
Figure RE-GDA0003497279450000023
representing the estimated values of the feature points under the action of the P2 projection matrix,
Figure RE-GDA0003497279450000024
representing the estimated value of the characteristic point under the action of a P3 projection matrix, wherein X represents the three-dimensional coordinate of the characteristic point;
the reprojection error is calculated from the three-view pairs, i.e.:
Figure RE-GDA0003497279450000025
where ω denotes the reprojection error, x1Representing the measured value, x, of the characteristic point in view 12Represents the measured value, x, of the feature point in view 23Representing the measured value of the characteristic point in view 3, d2(-) represents the square of the euclidean distance between two elements;
taking the reprojection error omega as the error measurement of the initialized model M, and forming an inner point set S by using a sample set and a sample subset S, wherein the error between the sample set P and the initialized model M is smaller than a set threshold th;
s2.3: calculating a new model M by adopting a least square method according to the inner point set S;
s2.4: and repeating the steps of S2.1, S2.2 and S2.3 until the maximum consistent set is obtained, removing the outer points, and recording the inner points and the trifocal tensor of the current cycle, wherein the inner points are the matching points.
Preferably, in S5, the global displacement is solved based on the global rotation of each image frame and the linear constraint relationship between the scene structure and the global displacement; the linear relational expression is:
Btl+Cti+Dtr=0
wherein the content of the first and second substances,
B[Xi]×Rr,iXr([Rr,lXr]xXl)T[Xl]xRl
C=|[Xl]×Rr,lXr||2[Xi]×Ri
D=-(B+C)
Rr,i=RiRr T
wherein, tlRepresenting the global displacement, t, of view liRepresenting the global displacement, t, of view irRepresenting a global displacement, X, of view riNormalized image coordinates representing feature points in View i [ ·]×Representing an inverse-symmetric matrix of vector correspondences, Rr,iRepresenting a relative rotation between views r and i, XrRepresenting the normalized image coordinates, R, of the feature points in view RlRepresenting a global rotation of view l, RiRepresenting a global rotation, R, of view irRepresenting a global rotation, X, of the view rlRepresenting the normalized image coordinates of the feature points in view l, T representing the transpose of the matrix, Rr,lRepresenting relative rotation between views r and l;
taking all feature points into account, all linear constraints are integrated to obtain the following formula:
F·t=0
where F is a coefficient matrix formed by B, C, D, and t is (t)1 T,t2 T,...,tn T)TA global displacement representing all n views;
solving the linear homogeneous equation can obtain the optimal value of the global displacement
Figure RE-GDA0003497279450000031
Preferably, in S6, the step of performing pose-only nonlinear optimization adjustment includes:
s6.1: based on each three-view pair, a reprojection vector is calculated, namely:
Figure RE-GDA0003497279450000032
wherein the content of the first and second substances,
Figure RE-GDA0003497279450000041
a re-projection vector is represented which is,
Figure RE-GDA0003497279450000042
represents a vector (0,0,1),
Figure RE-GDA0003497279450000043
is the depth of field, X, of the feature point calculated from the reference viewrRepresenting the normalized image coordinates, t, of the feature points in view rr,iRepresenting the relative displacement between views r, i, as:
tr,i=Rj(tr-ti)
wherein R isjRepresenting a global rotation, t, of view jrRepresenting the global displacement, t, of the view riRepresents the global displacement of view i;
s6.2: calculating and adding reprojection errors for all the feature points to obtain an error term epsilon, wherein the expression of the error term epsilon is as follows:
Figure RE-GDA0003497279450000044
wherein the content of the first and second substances,
Figure RE-GDA0003497279450000045
representing the reprojection vector, xiRepresenting the measured value of the characteristic point in the view i, and T represents the transposition of the matrix;
s6.3: optimizing through a general graph optimization library, taking the pose of each frame as a node of graph optimization, and taking the reprojection error of each feature point as an edge of the graph optimization.
Preferably, in S7, θ(r,j)Recovery as a feature pointCriterion of complex mass according to theta(r,j)Calculating a weighted depth Z of a view rrWeighted depth ZrThe expression of (a) is:
Figure RE-GDA0003497279450000046
ω(r,j)=θ(r,j)/∑1≤j≤nθ(r,j)
θ(r,j)=||[Xj]×Rr,jXr||
where j denotes the jth view,
Figure RE-GDA0003497279450000047
representing the depth of field, ω(r,j)Represents a weighted value, θ(r,j)Indicating the quality of recovery of the characteristic points, Rr,jRepresenting a relative rotation, X, between views r, jrNormalized image coordinates, X, representing feature points in view rjRepresents the coordinates of the feature points under view j [ ·]×An antisymmetric matrix representing the vector;
according to a weighted depth ZrPerforming weighted reconstruction on the initial map, recovering the three-dimensional coordinates of the feature points, wherein the recovered three-dimensional coordinates of the feature points are expressed as:
XW=ZrRrXr+tr
wherein, XWRepresenting the three-dimensional coordinates of the restored feature points, ZrRepresenting the weighted depth, R, of view RrRepresenting a global rotation, X, of the view rrRepresenting the normalized image coordinates, t, of the feature points in view rrRepresenting a global displacement of view r.
Preferably, in S6.3, the method further includes increasing robustness of the optimizer by setting a kernel function.
Has the advantages that:
1. compared with the traditional system which uses two frames of information for initialization, the method can utilize more video stream information in the process of solving the initial pose and obtain the high-precision initial pose by an averaging method by introducing the technology of a global motion recovery structure.
2. In the triangulation process of the initial map, a weighted reconstruction technology is utilized, a plurality of observation information can be comprehensively utilized aiming at one feature point, and the precision of the feature point is improved.
3. And a nonlinear optimization strategy taking the initial pose as an optimization variable is adopted in the optimization process, so that compared with the traditional nonlinear optimization strategy, the number of scattered points in the initial map is reduced, and the accuracy of the initial map is further improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a monocular SLAM robust initialization method based on multiple frames in the implementation of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the present embodiment provides a monocular SLAM robust initialization method based on multiple frames, which includes the steps of:
s1: extracting feature points of each image frame in the initial video stream, matching the image frames pairwise according to the feature points, screening matching points through a Random Sample Consensus (RANSAC) algorithm, and acquiring an initial matching point pair, wherein the matching points are matched feature points; judging whether enough characteristic points are possessed, if so, performing S2; otherwise, S1 is repeated.
S2: and screening out three-view pairs according to the initial matching point pairs, namely three image frames with enough common-view characteristic points, further screening out matching points in the three-view pairs based on a random sampling consistency algorithm of the trifocal tensor, and constructing a three-frame matching graph, namely a topological graph describing the common-view relation among the image frames.
The method comprises the following steps that a basic matrix E is used as an initialization model M between two frames, matching points cannot be completely eliminated by using a random sampling consistency algorithm, mismatching points can be further eliminated by using a random sampling consistency algorithm using a trifocal tensor as the initialization model M between three frames, and matching accuracy is improved; therefore, the specific steps of screening out the matching points in the three-view pairs are as follows:
s2.1: setting a sample set P with the minimum number of samples n, extracting n samples from the sample set P to form a sample subset S, and calculating an initial trifocal tensor by using an intrinsic matrix between every two views to serve as an initialization model M.
S2.2: obtaining projection matrixes P1, P2 and P3 of the trifocal tensor according to the initial trifocal tensor, calculating coordinates of the characteristic points by using a least square method, and respectively obtaining three estimated values of the characteristic points through the projection matrixes P1, P2 and P3, wherein the three estimated values are as follows:
Figure RE-GDA0003497279450000061
wherein the content of the first and second substances,
Figure RE-GDA0003497279450000062
representing the estimated values of the feature points under the action of the P1 projection matrix,
Figure RE-GDA0003497279450000063
representing the estimated values of the feature points under the action of the P2 projection matrix,
Figure RE-GDA0003497279450000064
representing the estimated value of the characteristic point under the action of a P3 projection matrix, wherein X represents the three-dimensional coordinate of the characteristic point;
the reprojection error is calculated from the three-view pairs, i.e.:
Figure RE-GDA0003497279450000065
where ω denotes the reprojection error, x1Representing the measured value, x, of the characteristic point in view 12Represents the measured value, x, of the feature point in view 23Representing the measured value of the characteristic point in view 3, d2(-) represents the square of the euclidean distance between two elements;
and taking the reprojection error omega as an error measure of the initialized model M, and forming an inner point set S by using the sample set and the sample subset S, wherein the error between the sample set P and the initialized model M is smaller than a set threshold th.
S2.3: and if the inner point set accounts for more than 75% of the sample set, determining that correct model parameters are obtained, and calculating a new model M by adopting a least square method according to the inner point set S.
S2.4: and repeating the steps of S2.1, S2.2 and S2.3 until the maximum consistent set is obtained, removing the outer points, and recording the inner points and the trifocal tensor of the current cycle, wherein the inner points are the matching points.
S3: according to the double-view geometrical principle, the relative rotation between the image frames is solved.
S4: the global rotation is calculated by an iterative weighted Least squares (IRLS) method based on the relative rotation between the image frames.
S5: under the accurate global rotation background, a linear relation exists between the scene structure and the global displacement, and a linear global translation constraint can be constructed, so that the global displacement can be directly solved based on the global rotation of each image frame and the linear constraint relation between the scene structure and the global displacement;
wherein, the linear relation expression is:
Btl+Cti+Dtr=0
wherein the content of the first and second substances,
B=[Xi]×Rr,iXr([Rr,lXr]xXl)T[Xl]xRl
C=||[Xl]×Rr,lXr||2[Xi]×Ri
D=-(B+C)
Rr,i=RiRr T
wherein, tlRepresenting the global displacement, t, of view liRepresenting the global displacement, t, of view irRepresenting a global displacement, X, of view riNormalized image coordinates representing feature points in View i [ ·]×Representing an inverse-symmetric matrix of vector correspondences, Rr,iRepresenting a relative rotation between views r and i, XrRepresenting the normalized image coordinates, R, of the feature points in view RlRepresenting a global rotation of view l, RiRepresenting a global rotation, R, of view irRepresenting a global rotation, X, of the view rlRepresenting the normalized image coordinates of the feature points in view l, T representing the transpose of the matrix, Rr,lRepresenting relative rotation between views r and l;
considering all feature points, and combining all linear constraints, the following equation can be obtained:
F·t=0
where F is a coefficient matrix formed by B, C, D, and t is (t)1 T,t2 T,...,tn T)TRepresenting the global displacement of all n views.
Solving the linear homogeneous equation can obtain the optimal value of the global displacement
Figure RE-GDA0003497279450000071
S6: the global rotation and the global displacement are integrated to obtain the initial pose of each frame, and on the basis, the pose of each frame is optimized by using a pose-only nonlinear optimization adjustment strategy, so that the reprojection error is minimum;
wherein, the step of carrying out the nonlinear optimization adjustment of only the pose is as follows:
s6.1: based on each three-view pair, a reprojection vector is calculated, namely:
Figure RE-GDA0003497279450000081
wherein the content of the first and second substances,
Figure RE-GDA0003497279450000082
a re-projection vector is represented which is,
Figure RE-GDA0003497279450000083
represents a vector (0,0,1),
Figure RE-GDA0003497279450000084
is the depth of field, X, of the feature point calculated from the reference viewrRepresenting the normalized image coordinates, t, of the feature points in view rr,iRepresenting the relative displacement between views r, i, as:
tr,i=Rj(tr-ti)
wherein R isjRepresenting a global rotation, t, of view jrRepresenting the global displacement, t, of the view riRepresents the global displacement of view i;
s6.2: calculating and adding reprojection errors for all the feature points to obtain an error term epsilon, wherein the expression of the error term epsilon is as follows:
Figure RE-GDA0003497279450000085
wherein the content of the first and second substances,
Figure RE-GDA0003497279450000086
representing the reprojection vector, xiRepresenting the measured value of the characteristic point in the view i, and T represents the transposition of the matrix;
s6.3: optimizing through a general graph optimization library, taking the pose of each frame as a node of graph optimization, taking the reprojection error of each feature point as an edge of the graph optimization, and improving the robustness of the optimizer by setting a kernel function.
S7: calculating the depth of field of the feature points by utilizing a triangulation technology, and recovering three-dimensional coordinates of the feature points;
the method specifically comprises the following steps: will theta(r,j)As a criterion for the quality of the restoration of the feature points, according to θ(r,j)Calculating a weighted depth Z of a view rrWeighted depth ZrThe expression of (a) is:
Figure RE-GDA0003497279450000087
ω(r,j)=θ(r,j)/∑1≤j≤nθ(r,j)
θ(r,j)=||[Xj]×Rr,jXr||
where j denotes the jth view,
Figure RE-GDA0003497279450000091
representing the depth of field, ω(r,j)Represents a weighted value, θ(r,j)Indicating the quality of recovery of the characteristic points, Rr,jRepresenting a relative rotation, X, between views r, jrNormalized image coordinates, X, representing feature points in view rjRepresents the coordinates of the feature points under view j [ ·]×An antisymmetric matrix representing the vector;
according to a weighted depth ZrPerforming weighted reconstruction on the initial map, recovering the three-dimensional coordinates of the feature points, wherein the recovered three-dimensional coordinates of the feature points are expressed as:
XW=ZrRrXr+tr
wherein, XWRepresenting the three-dimensional coordinates of the restored feature points, ZrRepresenting the weighted depth, R, of view RrRepresenting a global rotation, X, of the view rrRepresenting the normalized image coordinates, t, of the feature points in view rrRepresenting a global displacement of view r.
The monocular SLAM robust initialization method based on the multi-frame provided by the embodiment has the following beneficial effects:
1. compared with the traditional system which uses two frames of information for initialization, the embodiment can more utilize video stream information in the process of solving the initial pose by introducing the technology of the global motion recovery structure into the monocular initialization system, and obtains the high-precision initial pose by an averaging method.
2. In the triangulation process of the initial map, a weighted reconstruction technology is utilized, a plurality of observation information can be comprehensively utilized aiming at one feature point, and the precision of the initial map is improved.
3. And a nonlinear optimization strategy taking the initial pose as an optimization variable is adopted in the optimization process, so that compared with the traditional nonlinear optimization strategy, the number of scattered points in the initial map is reduced, and the accuracy of the initial map is further improved.
The present invention is not limited to the above preferred embodiments, and any modification, equivalent replacement or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. A monocular SLAM robust initialization method based on multiple frames is characterized by comprising the following steps:
s1: extracting feature points of each image frame in the initial video stream, matching the image frames pairwise according to the feature points, screening out matching points, and obtaining initial matching point pairs, wherein the matching points are matched feature points;
s2: screening three-view pairs, namely three image frames with enough common-view characteristic points according to the initial matching point pairs, further screening matching points in the three-view pairs based on a random sampling consistency algorithm of a trifocal tensor, and constructing three-frame matching graphs, namely topological graphs describing the common-view relation among the image frames;
s3: solving the relative rotation between each image frame according to a double-view geometric principle;
s4: solving global rotation by adopting an iterative weighted least square method according to the relative rotation between the image frames;
s5: solving the global displacement based on the global rotation of each image frame and the linear constraint relation between the scene structure and the global displacement;
s6: the global rotation and the global displacement are integrated to obtain the initial pose of each frame, and on the basis, the pose of each frame is optimized by using a pose-only nonlinear optimization adjustment strategy;
s7: and calculating the depth of field of the feature points and recovering the three-dimensional coordinates of the feature points.
2. The multi-frame-based monocular SLAM robust initialization method as recited in claim 1, wherein in S1, feature points of each image frame in the initial video stream are extracted, the image frames are matched with each other pairwise according to the feature points, matching points are screened through a random sampling consistency algorithm to obtain initial matching point pairs, and the matching points are matched feature points.
3. The multi-frame-based monocular SLAM robust initialization method according to claim 2, wherein the specific step of screening out the matching points in the three-view pair is:
s2.1: setting a sample set P with the minimum number of samples n, extracting n samples from the sample set P to form a sample subset S, and calculating an initial trifocal tensor by using an intrinsic matrix between views to serve as an initialization model M;
s2.2: obtaining projection matrixes P1, P2 and P3 of the trifocal tensor according to the initial trifocal tensor, calculating coordinates of the characteristic points by using a least square method, and respectively obtaining three estimated values of the characteristic points through the projection matrixes P1, P2 and P3, wherein the three estimated values are as follows:
Figure RE-FDA0003497279440000011
wherein the content of the first and second substances,
Figure RE-FDA0003497279440000012
representing the estimated values of the feature points under the action of the P1 projection matrix,
Figure RE-FDA0003497279440000013
representing the estimated values of the feature points under the action of the P2 projection matrix,
Figure RE-FDA0003497279440000014
representing the estimated value of the characteristic point under the action of a P3 projection matrix, wherein X represents the three-dimensional coordinate of the characteristic point;
the reprojection error is calculated from the three-view pairs, i.e.:
Figure RE-FDA0003497279440000015
where ω denotes the reprojection error, x1Representing the measured value, x, of the characteristic point in view 12Represents the measured value, x, of the feature point in view 23Representing the measured value of the characteristic point in view 3, d2(-) represents the square of the euclidean distance between two elements;
taking the reprojection error omega as the error measurement of the initialized model M, and forming an inner point set S by using a sample set and a sample subset S, wherein the error between the sample set P and the initialized model M is smaller than a set threshold th;
s2.3: calculating a new model M by adopting a least square method according to the inner point set S;
s2.4: and repeating the S2.1, the S2.2 and the S2.3 until a maximum consistent set is obtained, removing outer points, and recording inner points and the trifocal tensor of the current cycle, wherein the inner points are matching points.
4. The method of claim 3, wherein in step S5, the global displacement is solved based on the global rotation of each image frame and the linear constraint relationship between the scene structure and the global displacement; the linear relational expression is as follows:
Btl+Cti+Dtr=0
wherein the content of the first and second substances,
B=[Xi]×Rr,iXr([Rr,lXr]xXl)T[Xl]xRl
C=||[Xl]×Rr,lXr||2[Xi]×Ri
D=-(B+C)
Rr,i=RiRr T
wherein, tlRepresenting the global displacement, t, of view liRepresenting the global displacement, t, of view irRepresenting a global displacement, X, of view riNormalized image coordinates representing feature points in View i [ ·]×Representing an inverse-symmetric matrix of vector correspondences, Rr,iRepresenting a relative rotation between views r and i, XrRepresenting the normalized image coordinates, R, of the feature points in view RlRepresenting a global rotation of view l, RiRepresenting a global rotation, R, of view irRepresenting a global rotation, X, of the view rlRepresenting the normalized image coordinates of the feature points in view l, T representing the transpose of the matrix, Rr,lRepresenting relative rotation between views r and l;
considering all the feature points, and integrating all the linear constraints, we can get the following formula:
F·t=0
where F is a coefficient matrix formed by B, C, D, and t is (t)1 T,t2 T,...,tn T)TA global displacement representing all n views;
solving the linear homogeneous equation can obtain the optimal value of the global displacement
Figure RE-FDA0003497279440000031
5. The multi-frame based monocular SLAM robust initialization method of claim 4, wherein in S6, the step of pose-only nonlinear optimization adjustment comprises:
s6.1: based on each three-view pair, a reprojection vector is calculated, namely:
Figure RE-FDA0003497279440000032
wherein the content of the first and second substances,
Figure RE-FDA0003497279440000033
a re-projection vector is represented which is,
Figure RE-FDA0003497279440000034
represents a vector (0,0,1),
Figure RE-FDA0003497279440000035
is the depth of field, X, of the feature point calculated from the reference viewrRepresenting the normalized image coordinates, t, of the feature points in view rr,iRepresenting the relative displacement between views r, i, as:
tr,i=Rj(tr-ti)
wherein R isjRepresenting a global rotation, t, of view jrRepresenting the global displacement, t, of the view riRepresents the global displacement of view i;
s6.2: calculating and adding reprojection errors for all the feature points to obtain an error term epsilon, wherein the expression of the error term epsilon is as follows:
Figure RE-FDA0003497279440000036
wherein the content of the first and second substances,
Figure RE-FDA0003497279440000037
representing the reprojection vector, xiRepresenting the measured value of the characteristic point in the view i, and T represents the transposition of the matrix;
s6.3: optimizing through a general graph optimization library, taking the pose of each frame as a node of graph optimization, and taking the reprojection error of each feature point as an edge of the graph optimization.
6. The method of claim 5, wherein in S7 θ is determined according to the received request signal(r,j)As a criterion for the quality of the restoration of the feature points, according to θ(r,j)Calculating a weighted depth Z of a view rrWeighted depth ZrThe expression of (a) is:
Figure RE-FDA0003497279440000038
ω(r,j)=θ(r,j)/∑1≤j≤nθ(r,j)
θ(r,j)=||[Xj]×Rr,jXr||
where j denotes the jth view,
Figure RE-FDA0003497279440000041
representing the depth of field, ω(r,j)Represents a weighted value, θ(r,j)Indicating the quality of recovery of the characteristic points, Rr,jRepresenting a relative rotation, X, between views r, jrNormalized image coordinates, X, representing feature points in view rjRepresents the coordinates of the feature points under view j [ ·]×An antisymmetric matrix representing the vector;
according to a weighted depth ZrPerforming weighted reconstruction on the initial map, recovering the three-dimensional coordinates of the feature points, wherein the recovered three-dimensional coordinates of the feature points are expressed as:
XW=ZrRrXr+tr
wherein, XWRepresenting the three-dimensional coordinates of the restored feature points, ZrRepresenting the weighted depth, R, of view RrRepresenting a global rotation, X, of the view rrRepresenting the normalized image coordinates, t, of the feature points in view rrRepresenting a global displacement of view r.
7. The method of claim 5, wherein S6.3 further comprises enhancing the robustness of the optimizer by setting a kernel function.
CN202111499604.1A 2021-12-09 2021-12-09 Monocular SLAM robust initialization method based on multiframe Active CN114399547B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111499604.1A CN114399547B (en) 2021-12-09 2021-12-09 Monocular SLAM robust initialization method based on multiframe

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111499604.1A CN114399547B (en) 2021-12-09 2021-12-09 Monocular SLAM robust initialization method based on multiframe

Publications (2)

Publication Number Publication Date
CN114399547A true CN114399547A (en) 2022-04-26
CN114399547B CN114399547B (en) 2024-01-02

Family

ID=81227588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111499604.1A Active CN114399547B (en) 2021-12-09 2021-12-09 Monocular SLAM robust initialization method based on multiframe

Country Status (1)

Country Link
CN (1) CN114399547B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117314735A (en) * 2023-09-26 2023-12-29 长光辰英(杭州)科学仪器有限公司 Global optimization coordinate mapping conversion method based on minimized reprojection error

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120121161A1 (en) * 2010-09-24 2012-05-17 Evolution Robotics, Inc. Systems and methods for vslam optimization
CN108090958A (en) * 2017-12-06 2018-05-29 上海阅面网络科技有限公司 A kind of robot synchronously positions and map constructing method and system
US20180180733A1 (en) * 2016-12-27 2018-06-28 Gerard Dirk Smits Systems and methods for machine perception
WO2020168668A1 (en) * 2019-02-22 2020-08-27 广州小鹏汽车科技有限公司 Slam mapping method and system for vehicle

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120121161A1 (en) * 2010-09-24 2012-05-17 Evolution Robotics, Inc. Systems and methods for vslam optimization
US20180180733A1 (en) * 2016-12-27 2018-06-28 Gerard Dirk Smits Systems and methods for machine perception
CN108090958A (en) * 2017-12-06 2018-05-29 上海阅面网络科技有限公司 A kind of robot synchronously positions and map constructing method and system
WO2020168668A1 (en) * 2019-02-22 2020-08-27 广州小鹏汽车科技有限公司 Slam mapping method and system for vehicle

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘勇: "基于影像的运动平台自定位测姿", 中国博士学位论文全文数据库基础科学辑 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117314735A (en) * 2023-09-26 2023-12-29 长光辰英(杭州)科学仪器有限公司 Global optimization coordinate mapping conversion method based on minimized reprojection error
CN117314735B (en) * 2023-09-26 2024-04-05 长光辰英(杭州)科学仪器有限公司 Global optimization coordinate mapping conversion method based on minimized reprojection error

Also Published As

Publication number Publication date
CN114399547B (en) 2024-01-02

Similar Documents

Publication Publication Date Title
CN107301654B (en) Multi-sensor high-precision instant positioning and mapping method
CN106204574B (en) Camera pose self-calibrating method based on objective plane motion feature
US8953847B2 (en) Method and apparatus for solving position and orientation from correlated point features in images
CN109166149A (en) A kind of positioning and three-dimensional wire-frame method for reconstructing and system of fusion binocular camera and IMU
CN110443836A (en) A kind of point cloud data autoegistration method and device based on plane characteristic
CN110807809B (en) Light-weight monocular vision positioning method based on point-line characteristics and depth filter
US10755139B2 (en) Random sample consensus for groups of data
CN101826206B (en) Camera self-calibration method
CN113256698B (en) Monocular 3D reconstruction method with depth prediction
CN110796694A (en) Fruit three-dimensional point cloud real-time acquisition method based on KinectV2
CN103440659B (en) Based on starry sky image Distortion Detect and the method for estimation of star pattern matching
CN112419497A (en) Monocular vision-based SLAM method combining feature method and direct method
CN110009745B (en) Method for extracting plane from point cloud according to plane element and model drive
CN111899290A (en) Three-dimensional reconstruction method combining polarization and binocular vision
Nieto et al. Non-linear optimization for robust estimation of vanishing points
CN112652020A (en) Visual SLAM method based on AdaLAM algorithm
Yuan et al. Sdv-loam: Semi-direct visual-lidar odometry and mapping
CN103824294A (en) Method for aligning electronic cross-sectional image sequence
CN114399547A (en) Monocular SLAM robust initialization method based on multiple frames
Zhang et al. Efficient Pairwise 3-D Registration of Urban Scenes via Hybrid Structural Descriptors.
CN113506342B (en) SLAM omni-directional loop correction method based on multi-camera panoramic vision
CN111079826A (en) SLAM and image processing fused construction progress real-time identification method
Yao et al. Registrating oblique SAR images based on complementary integrated filtering and multilevel matching
CN111160362B (en) FAST feature homogenizing extraction and interframe feature mismatching removal method
CN112288814A (en) Three-dimensional tracking registration method for augmented reality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant