CN114399547B - Monocular SLAM robust initialization method based on multiframe - Google Patents

Monocular SLAM robust initialization method based on multiframe Download PDF

Info

Publication number
CN114399547B
CN114399547B CN202111499604.1A CN202111499604A CN114399547B CN 114399547 B CN114399547 B CN 114399547B CN 202111499604 A CN202111499604 A CN 202111499604A CN 114399547 B CN114399547 B CN 114399547B
Authority
CN
China
Prior art keywords
representing
view
points
global
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111499604.1A
Other languages
Chinese (zh)
Other versions
CN114399547A (en
Inventor
胡德文
葛杨冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202111499604.1A priority Critical patent/CN114399547B/en
Publication of CN114399547A publication Critical patent/CN114399547A/en
Application granted granted Critical
Publication of CN114399547B publication Critical patent/CN114399547B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a monocular SLAM robust initialization method based on a plurality of frames, which comprises the following steps: extracting characteristic points of image frames in an initial video stream to match with each other, screening out matching points, and obtaining initial matching point pairs; screening out three-view pairs according to the initial matching point pairs, further screening out matching points in the three-view pairs based on a random sampling coincidence algorithm of a trifocal tensor, and constructing a three-frame matching diagram; according to the principle of double-view geometry, solving the relative rotation between each image frame; solving global rotation according to the relative rotation among the image frames; solving global displacement based on global rotation; the global rotation and the global displacement are synthesized to obtain the initial pose of each frame, and nonlinear optimization adjustment is carried out according to the initial pose; calculating the depth of field of the feature points, and recovering the three-dimensional coordinates of the feature points; the method can improve the convergence speed and reduce the occurrence of scattered points, thereby improving the accuracy of the initial map.

Description

Monocular SLAM robust initialization method based on multiframe
Technical Field
The invention belongs to the technical field of monocular SLAM initialization, and particularly relates to a monocular SLAM robust initialization method based on multiple frames.
Background
The goal of the synchronized localization and mapping technique (simultaneous localization and mapping, SLAM) is to reconstruct an unknown environment while estimating the motion trajectories of the cameras. The technology is widely applied to the fields of augmented reality, automatic driving and the like at present, and can run in real time under the background independent of external infrastructure.
Initialization is the key to monocular SLAM technology, by which an initial pose of the camera can be obtained and an initial map generated, providing support for the subsequent tracking phase.
The current academy mainly utilizes an incremental motion restoration structure technology (Structure From Motion, SFM) to initialize a monocular SLAM system, mainly utilizes polar geometric constraint or planar structure constraint between two frames to construct a basic matrix or homography matrix, obtains the initial pose of a camera by decomposing the basic matrix or homography matrix, and obtains an initial map by utilizing a triangulation technology. The technology has high matching requirements on the initial gesture and the characteristic point of the camera, and the initialization process depends on the initial movement of the camera and cannot converge faster. Later in the initialization, cluster adjustment (Bundle Adjustment, BA) will be performed to further optimize the initial pose and the initial map, but after optimization, there are still "scatter points" whose distances from its nearest neighbor three-dimensional feature point far exceed the average distance between common three-dimensional feature points, which will cause errors in the tracking process, especially in case of poor image observation quality, which will affect the subsequent pose tracking.
Disclosure of Invention
The invention aims to overcome the defects that the prior art cannot be converged rapidly and has scattered points, and provides an SLAM initialization method capable of improving convergence speed and reducing scattered points so as to improve the accuracy of an initial map, in particular to a monocular SLAM robust initialization method based on multiple frames.
The invention provides a monocular SLAM robust initialization method based on a plurality of frames, which comprises the following steps:
s1: extracting characteristic points of each image frame in the initial video stream, carrying out mutual matching on the image frames in pairs according to the characteristic points, screening out matching points, and acquiring initial matching point pairs, wherein the matching points are matched characteristic points;
s2: screening out three-view pairs according to the initial matching point pairs, namely three image frames with enough common-view characteristic points, further screening out matching points in the three-view pairs based on a random sampling coincidence algorithm of a trifocal tensor, and constructing three-frame matching diagrams, namely topological diagrams describing common-view relations among the image frames;
s3: according to the principle of double-view geometry, solving the relative rotation between each image frame;
s4: solving global rotation by adopting an iterative weighted least square method according to the relative rotation among the image frames;
s5: solving global displacement based on a linear constraint relation between the global rotation of each image frame and the global displacement of the scene structure;
s6: the global rotation and global displacement are synthesized to obtain the initial pose of each frame, and based on the initial pose, the pose of each frame is optimized by utilizing a pose-only nonlinear optimization adjustment strategy;
s7: and calculating the depth of field of the feature points, and recovering the three-dimensional coordinates of the feature points.
Preferably, in S1, feature points of each image frame in the initial video stream are extracted, and the image frames are matched in pairs according to the feature points, and matching points are screened by a random sampling coincidence algorithm, so as to obtain an initial matching point pair, where the matching points are matched feature points.
Preferably, the specific step of screening out the matching points in the three-view pair is as follows:
s2.1: setting a sample set P with the minimum sample number of n, extracting n samples from the sample set P to form a sample subset S, and calculating an initial trifocal tensor as an initialization model M by using an inter-view essential matrix;
s2.2: obtaining projection matrixes P1, P2 and P3 of the three views according to the initial trifocal tensor, calculating feature point coordinates by using a least square method, and obtaining three estimated values of the feature points through the projection matrixes P1, P2 and P3 respectively, namely:
wherein,estimated value representing characteristic point under the action of P1 projection matrix,/->Estimated value representing characteristic point under the action of P2 projection matrix,/>The estimated value of the characteristic point under the action of the P3 projection matrix is represented, and X represents the three-dimensional coordinate of the characteristic point;
the re-projection error is calculated from the three view pairs, namely:
wherein ω represents the reprojection error, x 1 Representing the measured value of a feature point in view 1, x 2 Representing the measured value of the feature point in view 2, x 3 Representing the measured value of the feature point in view 3, d 2 (·, ·) represents the square of the euclidean distance between two elements;
taking the re-projection error omega as error measurement of the initialization model M, and forming an inner point set S by a sample set and a sample subset S, wherein the error between the sample set P and the initialization model M is smaller than a set threshold value th;
s2.3: calculating a new model M by adopting a least square method according to the S of the inner point set;
s2.4: repeating S2.1, S2.2 and S2.3 until the maximum consistent set is obtained, removing the outer points, and recording the inner points and the triple-focus tensor of the current cycle, wherein the inner points are the matching points.
Preferably, in S5, the global displacement is solved based on the global rotation of each image frame and the linear constraint relationship between the scene structure and the global displacement; the linear relation expression is:
Bt l +Ct i +Dt r =0
wherein,
B[X i ]×R r,i X r ([R r,l X r ] x X l ) T [X l ] x R l
C=|[X l ] × R r,l X r || 2 [X i ] × R i
D=-(B+C)
R r,i =R i R r T
wherein t is l Representing the global displacement of view l, t i Representing the global displacement of view i, t r Representing global displacement of view r, X i Representing normalized image coordinates of feature points in view i, [ · ]] × Representing the corresponding antisymmetric matrix of the vector R r,i Representing the relative rotation between views r and i, X r Representing normalized image coordinates of feature points in view R, R l Representing the global rotation of view l, R i Representing the global rotation of view i, R r Representing a global rotation of view r, X l Representing normalized image coordinates of feature points in view l, T representing transpose of matrix, R r,l Representing the relative rotation between views r and l;
considering all the characteristic points, synthesizing all the linear constraints to obtain the following formula:
F·t=0
where F is a coefficient matrix composed of B, C, D, t= (t) 1 T ,t 2 T ,...,t n T ) T Representing global displacement for all n views;
solving the linear homogeneous equation to obtain the optimal value of global displacement
Preferably, in S6, the step of performing the nonlinear optimization adjustment of only the pose is:
s6.1: based on each three-view pair, a re-projection vector is calculated, namely:
wherein,representing the reprojection vector, +.>Representing vector (0, 1), -for example>Is the depth of field of the feature point obtained by calculation of the reference view, X r Representing normalized image coordinates, t, of feature points in view r r,i Representation ofThe relative displacement between views r, i is expressed as:
t r,i =R j (t r -t i )
wherein R is j Representing the global rotation of view j, t r Representing global displacement of view r, t i Representing the global displacement of view i;
s6.2: and calculating and adding the reprojection errors of all the characteristic points to obtain an error term epsilon, wherein the expression of the error term epsilon is as follows:
wherein,representing the reprojection vector, x i Representing the measured value of the feature point in view i, T representing the transpose of the matrix;
s6.3: and optimizing through a universal graph optimizing library, taking the pose of each frame as a node of graph optimization, and taking the reprojection error of each characteristic point as a graph optimizing edge.
Preferably, in S7, θ (r,j) As a judgment standard of the quality of feature point recovery, according to θ (r,j) Calculating the weighted depth Z of view r r Weighted depth Z r The expression of (2) is:
ω (r,j) =θ (r,j) /∑ 1≤j≤n θ (r,j)
θ (r,j) =||[X j ] × R r,j X r ||
wherein j represents a j-th view,indicating depth of field, ω (r,j) Represents the weighted value, θ (r,j) Representing the feature point recovery quality, R r,j Representing the relative rotation between views r, j, X r Representing normalized image coordinates, X, of feature points in view r j Representing the coordinates of the feature point under view j, [ ·] × An antisymmetric matrix representing the vector;
according to the weighted depth Z r Carrying out weighted reconstruction on the initial map, recovering the three-dimensional coordinates of the feature points, and representing the three-dimensional coordinates of the feature points after recovery as follows:
X W =Z r R r X r +t r
wherein X is W Representing the three-dimensional coordinates of the restored characteristic points, Z r Representing the weighted depth of view R, R r Representing a global rotation of view r, X r Representing normalized image coordinates, t, of feature points in view r r Representing the global displacement of view r.
Preferably, in S6.3, the method further comprises improving the robustness of the optimizer by setting a kernel function.
The beneficial effects are that:
1. compared with the traditional system using two frames of information for initialization, the method can utilize video stream information more in the solving process of the initial pose by introducing the global motion recovery structure technology, and obtain the high-precision initial pose by an average method.
2. In the process of triangulation of the initial map, a weighted reconstruction technology is utilized, a plurality of pieces of observation information can be comprehensively utilized aiming at one feature point, and the precision of the feature point is improved.
3. In the optimization process, a nonlinear optimization strategy taking the initial pose as an optimization variable is adopted, so that the number of scattered points in the initial map is reduced, and the accuracy of the initial map is further improved compared with the traditional nonlinear optimization strategy.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a monocular SLAM robust initialization method based on multiple frames in the implementation of the present invention.
Detailed Description
The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the present embodiment provides a monocular SLAM robust initialization method based on multiple frames, which includes the steps of:
s1: extracting characteristic points of each image frame in the initial video stream, carrying out mutual matching on the image frames in pairs according to the characteristic points, screening matching points through a random sampling consensus (Random Sample Consensus, RANSAC) algorithm, and obtaining initial matching point pairs, wherein the matching points are matched characteristic points; judging whether the feature points are enough or not, if so, performing S2; if not, repeating S1.
S2: and screening out three-view pairs according to the initial matching point pairs, namely three image frames with enough common-view characteristic points, further screening out matching points in the three-view pairs based on a random sampling coincidence algorithm of a trifocal tensor, and constructing a three-frame matching diagram, namely a topological diagram for describing the common-view relation among the image frames.
The base matrix E is used as an initialization model M between two frames, matching points cannot be completely removed by utilizing a random sampling consistency algorithm, and error matching points can be further removed by utilizing a random sampling consistency algorithm which uses a trifocal tensor as the initialization model M between three frames, so that matching precision is improved; the specific steps for screening the matching points in the three-view pair are as follows:
s2.1: setting a sample set P with the minimum sample number of n, extracting n samples from the sample set P to form a sample subset S, and calculating an initial trifocal tensor as an initialization model M by using an essential matrix between every two views.
S2.2: obtaining projection matrixes P1, P2 and P3 of the three views according to the initial trifocal tensor, calculating feature point coordinates by using a least square method, and obtaining three estimated values of the feature points through the projection matrixes P1, P2 and P3 respectively, namely:
wherein,estimated value representing characteristic point under the action of P1 projection matrix,/->Estimated value representing characteristic point under the action of P2 projection matrix,/>The estimated value of the characteristic point under the action of the P3 projection matrix is represented, and X represents the three-dimensional coordinate of the characteristic point;
the re-projection error is calculated from the three view pairs, namely:
wherein ω represents the reprojection error, x 1 Representing the measured value of a feature point in view 1, x 2 Representing the measured value of the feature point in view 2, x 3 Representing the measured value of the feature point in view 3, d 2 (·, ·) represents the square of the euclidean distance between two elements;
taking the re-projection error omega as an error measure of the initialization model M, and forming an interior point set S by a sample set and a sample subset S, wherein the error between the sample set P and the initialization model M is smaller than a set threshold value th.
S2.3: if the inner point set accounts for more than 75% of the sample set, the correct model parameters are considered to be obtained, and a new model M is calculated by adopting a least square method according to the inner point set S.
S2.4: repeating S2.1, S2.2 and S2.3 until the maximum consistent set is obtained, removing the outer points, and recording the inner points and the triple-focus tensor of the current cycle, wherein the inner points are the matching points.
S3: according to the principle of double vision geometry, the relative rotation between the image frames is solved.
S4: the global rotation is calculated by an iteratively weighted least squares method (Iteratively Reweighted Least Square, IRLS) based on the relative rotation between the image frames.
S5: under the accurate global rotation background, a linear relation exists between a scene structure and global displacement, and linear global translation constraint can be constructed, so that the global displacement can be directly solved based on the global rotation of each image frame and the linear constraint relation between the scene structure and the global displacement;
wherein, the linear relation expression is:
Bt l +Ct i +Dt r =0
wherein,
B=[X i ] × R r,i X r ([R r,l X r ] x X l ) T [X l ] x R l
C=||[X l ] × R r,l X r || 2 [X i ] × R i
D=-(B+C)
R r,i =R i R r T
wherein t is l Representing the global displacement of view l, t i Representing the global displacement of view i, t r Representing global displacement of view r, X i Representing normalized image coordinates of feature points in view i, [ · ]] × Representing the corresponding antisymmetric matrix of the vector R r,i Representing the relative rotation between views r and i, X r Representing normalized image coordinates of the feature points in view r,R l representing the global rotation of view l, R i Representing the global rotation of view i, R r Representing a global rotation of view r, X l Representing normalized image coordinates of feature points in view l, T representing transpose of matrix, R r,l Representing the relative rotation between views r and l;
considering all feature points, and integrating all linear constraints, the following formula can be obtained:
F·t=0
where F is a coefficient matrix composed of B, C, D, t= (t) 1 T ,t 2 T ,...,t n T ) T Representing the global displacement of all n views.
Solving the linear homogeneous equation to obtain the optimal value of global displacement
S6: the global rotation and global displacement are synthesized to obtain the initial pose of each frame, and based on the initial pose, the pose of each frame is optimized by utilizing a pose-only nonlinear optimization adjustment strategy, so that the reprojection error is minimum;
the method comprises the following steps of:
s6.1: based on each three-view pair, a re-projection vector is calculated, namely:
wherein,representing the reprojection vector, +.>Representing vector (0, 1), -for example>Is a feature point obtained by reference view calculationDepth of field, X r Representing normalized image coordinates, t, of feature points in view r r,i Representing the relative displacement between views r, i, expressed as:
t r,i =R j (t r -t i )
wherein R is j Representing the global rotation of view j, t r Representing global displacement of view r, t i Representing the global displacement of view i;
s6.2: and calculating and adding the reprojection errors of all the characteristic points to obtain an error term epsilon, wherein the expression of the error term epsilon is as follows:
wherein,representing the reprojection vector, x i Representing the measured value of the feature point in view i, T representing the transpose of the matrix;
s6.3: optimizing through a universal graph optimizing library, taking the pose of each frame as a node of graph optimization, taking the reprojection error of each characteristic point as an edge of graph optimization, and improving the robustness of the optimizer through setting a kernel function.
S7: calculating the depth of field of the feature points by using a triangulation technique, and recovering the three-dimensional coordinates of the feature points;
the method comprises the following steps: will be theta (r,j) As a judgment standard of the quality of feature point recovery, according to θ (r,j) Calculating the weighted depth Z of view r r Weighted depth Z r The expression of (2) is:
ω (r,j) =θ (r,j) /∑ 1≤j≤n θ (r,j)
θ (r,j) =||[X j ] × R r,j X r ||
wherein j represents a j-th view,indicating depth of field, ω (r,j) Represents the weighted value, θ (r,j) Representing the feature point recovery quality, R r,j Representing the relative rotation between views r, j, X r Representing normalized image coordinates, X, of feature points in view r j Representing the coordinates of the feature point under view j, [ ·] × An antisymmetric matrix representing the vector;
according to the weighted depth Z r Carrying out weighted reconstruction on the initial map, recovering the three-dimensional coordinates of the feature points, and representing the three-dimensional coordinates of the feature points after recovery as follows:
X W =Z r R r X r +t r
wherein X is W Representing the three-dimensional coordinates of the restored characteristic points, Z r Representing the weighted depth of view R, R r Representing a global rotation of view r, X r Representing normalized image coordinates, t, of feature points in view r r Representing the global displacement of view r.
The monocular SLAM robust initialization method based on the multiframe provided by the embodiment has the following beneficial effects:
1. compared with the traditional system using two frames of information for initialization, the embodiment can utilize video stream information more in the solving process of the initial pose by introducing the global motion recovery structure technology in the monocular initialization system, and obtain the high-precision initial pose by an average method.
2. In the process of triangulation of the initial map, a weighted reconstruction technology is utilized, a plurality of pieces of observation information can be comprehensively utilized aiming at one characteristic point, and the accuracy of the initial map is improved.
3. In the optimization process, a nonlinear optimization strategy taking the initial pose as an optimization variable is adopted, so that the number of scattered points in the initial map is reduced, and the accuracy of the initial map is further improved compared with the traditional nonlinear optimization strategy.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, or alternatives falling within the spirit and principles of the invention.

Claims (5)

1. A monocular SLAM robust initialization method based on multiple frames is characterized by comprising the following steps:
s1: extracting characteristic points of each image frame in an initial video stream, carrying out mutual matching on the image frames in pairs according to the characteristic points, screening out matching points, and obtaining initial matching point pairs, wherein the matching points are matched characteristic points;
extracting characteristic points of each image frame in an initial video stream, carrying out mutual matching on the image frames in pairs according to the characteristic points, screening matching points through a random sampling coincidence algorithm, and obtaining initial matching point pairs, wherein the matching points are matched characteristic points;
s2: screening out three-view pairs according to the initial matching point pairs, namely three image frames with enough common-view characteristic points, further screening out matching points in the three-view pairs based on a random sampling coincidence algorithm of a trifocal tensor, and constructing three-frame matching diagrams, namely topological diagrams describing common-view relations among the image frames;
the specific steps for screening the matching points in the three-view pair are as follows:
s2.1: setting a sample set P with the minimum sample number of n, extracting n samples from the sample set P to form a sample subset S, and calculating an initial trifocal tensor as an initialization model M by using an inter-view essential matrix;
s2.2: obtaining projection matrixes P1, P2 and P3 of the three views according to the initial trifocal tensor, calculating feature point coordinates by using a least square method, and obtaining three estimated values of the feature points through the projection matrixes P1, P2 and P3 respectively, namely:
wherein,estimated value representing characteristic point under the action of P1 projection matrix,/->Estimated value representing characteristic point under the action of P2 projection matrix,/>The estimated value of the characteristic point under the action of the P3 projection matrix is represented, and X represents the three-dimensional coordinate of the characteristic point;
the re-projection error is calculated from the three view pairs, namely:
wherein ω represents the reprojection error, x 1 Representing the measured value of a feature point in view 1, x 2 Representing the measured value of the feature point in view 2, x 3 Representing the measured value of the feature point in view 3, d 2 (·, ·) represents the square of the euclidean distance between two elements;
taking the re-projection error omega as error measurement of the initialization model M, and forming an inner point set S by a sample set and a sample subset S, wherein the error between the sample set P and the initialization model M is smaller than a set threshold value th;
s2.3: calculating a new model M by adopting a least square method according to the S of the inner point set;
s2.4: repeating S2.1, S2.2 and S2.3 until a maximum consistent set is obtained, removing outer points, and recording inner points and the triple-focus tensor of the current cycle, wherein the inner points are matching points;
s3: according to the principle of double-view geometry, solving the relative rotation between each image frame;
s4: solving global rotation by adopting an iterative weighted least square method according to the relative rotation among the image frames;
s5: solving global displacement based on a linear constraint relation between the global rotation of each image frame and the global displacement of the scene structure;
s6: the global rotation and global displacement are synthesized to obtain the initial pose of each frame, and based on the initial pose, the pose of each frame is optimized by utilizing a pose-only nonlinear optimization adjustment strategy;
s7: and calculating the depth of field of the feature points, and recovering the three-dimensional coordinates of the feature points.
2. The multi-frame-based monocular SLAM robust initialization method of claim 1, wherein in S5, the global displacement is solved based on the global rotation of each image frame and the linear constraint relationship between the scene structure and the global displacement; the linear relation expression is:
Bt l +Ct i +Dt r =0
wherein,
B=[X i ] × R r,i X r ([R r,l X r ] x X l ) T [X l ] x R l
C=||[X l ] × R r,l X r || 2 [X i ] × R i
D=-(B+C)
R r,i =R i R r T
wherein t is l Representing the global displacement of view l, t i Representing the global displacement of view i, t r Representing global displacement of view r, X i Representing normalized image coordinates of feature points in view i, [ · ]] × Representing the corresponding antisymmetric matrix of the vector R r,i Representing the relative rotation between views r and i, X r Representing normalized image coordinates of feature points in view R, R l Representing the global rotation of view l, R i Representing the global rotation of view i, R r Representing a global rotation of view r, X l Representing normalized image coordinates of feature points in view l, T representing transpose of matrix, R r,l Representing the relative rotation between views r and l;
considering all the characteristic points, synthesizing all the linear constraints to obtain the following formula:
F·t=0
where F is a coefficient matrix composed of B, C, D, t= (t) 1 T ,t 2 T ,....,t n T ) T Representing global displacement for all n views;
solving the linear homogeneous equation to obtain the optimal value of global displacement
3. The monocular SLAM robust initialization method based on multiple frames according to claim 2, wherein in S6, the step of performing the pose-only nonlinear optimization adjustment is:
s6.1: based on each three-view pair, a re-projection vector is calculated, namely:
wherein,representing the reprojection vector, +.>Representing vector (0, 1), -for example>Is the depth of field of the feature point obtained by calculation of the reference view, X r Representing normalized image coordinates, t, of feature points in view r r,i Representing the relative displacement between views r, i, expressed as:
t r,i =R j (t r -t i )
wherein R is j Representing the global rotation of view j, t r Representing global displacement of view r, t i Representing the global displacement of view i;
s6.2: and calculating and adding the reprojection errors of all the characteristic points to obtain an error term epsilon, wherein the expression of the error term epsilon is as follows:
wherein,representing the reprojection vector, x i Representing the measured value of the feature point in view i, T representing the transpose of the matrix;
s6.3: and optimizing through a universal graph optimizing library, taking the pose of each frame as a node of graph optimization, and taking the reprojection error of each characteristic point as a graph optimizing edge.
4. The method for monocular SLAM robust initialization over multiple frames according to claim 3, wherein θ is set in S7 (r,j ) As a judgment standard of the quality of feature point recovery, according to θ (r,j) Calculating the weighted depth Z of view r r Weighted depth Z r The expression of (2) is:
ω (r,j) =θ (r,j) /∑ 1≤j≤n θ (r,j)
θ (r,j) =||[X j ] × R r,j X r ||
wherein j represents a j-th view,indicating depth of field, ω (r,j) Represents the weighted value, θ (r,j) Representing the feature point recovery quality, R r,j Representing the relative rotation between views r, j, X r Representing normalized image coordinates, X, of feature points in view r j Representing the coordinates of the feature point under view j, [ ·] × Representing vectorsIs an antisymmetric matrix of (a);
according to the weighted depth Z r Carrying out weighted reconstruction on the initial map, recovering the three-dimensional coordinates of the feature points, and representing the three-dimensional coordinates of the feature points after recovery as follows:
X W =Z r R r X r +t r
wherein X is W Representing the three-dimensional coordinates of the restored characteristic points, Z r Representing the weighted depth of view R, R r Representing a global rotation of view r, X r Representing normalized image coordinates, t, of feature points in view r r Representing the global displacement of view r.
5. The method for monocular SLAM robust initialization over multiple frames of claim 3, further comprising increasing the robustness of the optimizer by setting a kernel function in S6.3.
CN202111499604.1A 2021-12-09 2021-12-09 Monocular SLAM robust initialization method based on multiframe Active CN114399547B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111499604.1A CN114399547B (en) 2021-12-09 2021-12-09 Monocular SLAM robust initialization method based on multiframe

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111499604.1A CN114399547B (en) 2021-12-09 2021-12-09 Monocular SLAM robust initialization method based on multiframe

Publications (2)

Publication Number Publication Date
CN114399547A CN114399547A (en) 2022-04-26
CN114399547B true CN114399547B (en) 2024-01-02

Family

ID=81227588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111499604.1A Active CN114399547B (en) 2021-12-09 2021-12-09 Monocular SLAM robust initialization method based on multiframe

Country Status (1)

Country Link
CN (1) CN114399547B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117314735B (en) * 2023-09-26 2024-04-05 长光辰英(杭州)科学仪器有限公司 Global optimization coordinate mapping conversion method based on minimized reprojection error

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090958A (en) * 2017-12-06 2018-05-29 上海阅面网络科技有限公司 A kind of robot synchronously positions and map constructing method and system
WO2020168668A1 (en) * 2019-02-22 2020-08-27 广州小鹏汽车科技有限公司 Slam mapping method and system for vehicle

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9286810B2 (en) * 2010-09-24 2016-03-15 Irobot Corporation Systems and methods for VSLAM optimization
CN110226184B (en) * 2016-12-27 2023-07-14 杰拉德·迪尔克·施密茨 System and method for machine perception

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090958A (en) * 2017-12-06 2018-05-29 上海阅面网络科技有限公司 A kind of robot synchronously positions and map constructing method and system
WO2020168668A1 (en) * 2019-02-22 2020-08-27 广州小鹏汽车科技有限公司 Slam mapping method and system for vehicle

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于影像的运动平台自定位测姿;刘勇;中国博士学位论文全文数据库基础科学辑;全文 *

Also Published As

Publication number Publication date
CN114399547A (en) 2022-04-26

Similar Documents

Publication Publication Date Title
CN107301654B (en) Multi-sensor high-precision instant positioning and mapping method
Teller et al. Calibrated, registered images of an extended urban area
CN110807809B (en) Light-weight monocular vision positioning method based on point-line characteristics and depth filter
CN109165680B (en) Single-target object dictionary model improvement method in indoor scene based on visual SLAM
US9942535B2 (en) Method for 3D scene structure modeling and camera registration from single image
CN106447601B (en) Unmanned aerial vehicle remote sensing image splicing method based on projection-similarity transformation
CN110009732B (en) GMS feature matching-based three-dimensional reconstruction method for complex large-scale scene
WO2019029099A1 (en) Image gradient combined optimization-based binocular visual sense mileage calculating method
CN108776989B (en) Low-texture planar scene reconstruction method based on sparse SLAM framework
WO2022206020A1 (en) Method and apparatus for estimating depth of field of image, and terminal device and storage medium
CN112652020B (en) Visual SLAM method based on AdaLAM algorithm
CN107220996B (en) One kind is based on the consistent unmanned plane linear array of three-legged structure and face battle array image matching method
CN112767546B (en) Binocular image-based visual map generation method for mobile robot
Eichhardt et al. Affine correspondences between central cameras for rapid relative pose estimation
Zhao et al. RTSfM: Real-time structure from motion for mosaicing and DSM mapping of sequential aerial images with low overlap
Yao et al. Relative camera refinement for accurate dense reconstruction
CN114066953A (en) Three-dimensional multi-modal image deformable registration method for rigid target
CN114399547B (en) Monocular SLAM robust initialization method based on multiframe
CN117557733B (en) Natural protection area three-dimensional reconstruction method based on super resolution
CN114119987A (en) Feature extraction and descriptor generation method and system based on convolutional neural network
CN112270748B (en) Three-dimensional reconstruction method and device based on image
CN116894876A (en) 6-DOF positioning method based on real-time image
CN112950527B (en) Stereo matching morphology measurement method based on limited geometric association constraint
CN111160362B (en) FAST feature homogenizing extraction and interframe feature mismatching removal method
CN114545412A (en) Space target attitude estimation method based on ISAR image sequence equivalent radar line-of-sight fitting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant