CN113720323B - Monocular vision inertial navigation SLAM method and device based on point-line feature fusion - Google Patents

Monocular vision inertial navigation SLAM method and device based on point-line feature fusion Download PDF

Info

Publication number
CN113720323B
CN113720323B CN202110873290.0A CN202110873290A CN113720323B CN 113720323 B CN113720323 B CN 113720323B CN 202110873290 A CN202110873290 A CN 202110873290A CN 113720323 B CN113720323 B CN 113720323B
Authority
CN
China
Prior art keywords
camera
point
imu
representing
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110873290.0A
Other languages
Chinese (zh)
Other versions
CN113720323A (en
Inventor
樊渊
潘高峰
宋程
潘天红
李腾
董翔
程松松
方笑晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202110873290.0A priority Critical patent/CN113720323B/en
Publication of CN113720323A publication Critical patent/CN113720323A/en
Application granted granted Critical
Publication of CN113720323B publication Critical patent/CN113720323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/38Electronic maps specially adapted for navigation; Updating thereof
    • G01C21/3804Creation or updating of map data
    • G01C21/3833Creation or updating of map data characterised by the source of data
    • G01C21/3841Data obtained from two or more sources, e.g. probe vehicles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a monocular vision inertial navigation SLAM method and device based on point-line feature fusion, wherein the method comprises the following steps: taking an image acquired by a monocular camera and environmental information acquired by an IMU as input information; obtaining a plurality of frame images in the moving process of the camera, and taking image frames successfully matched with the point line characteristics in all frame images as key frames; removing key frames with feature points lower than a first preset value from all key frames, forming a plurality of local maps by camera motion tracks formed by the rest key frames, removing the number of point line features lower than a second preset value from all the local maps, and performing BA optimization on the rest local maps; repeating the camera track to perform cycle detection and cycle correction to obtain a new local map, performing global BA optimization on the global map, and updating the global map through cycle detection and cycle correction to construct a complete map in the environment; the invention has the advantages that: with higher accuracy and smaller trajectory errors.

Description

Monocular vision inertial navigation SLAM method and device based on point-line feature fusion
Technical Field
The invention relates to the field of map construction, in particular to a monocular vision inertial navigation SLAM method and device based on point-line feature fusion.
Background
With the gradual development of SLAM (synchronous localization and mapping) technology, vision-based SLAM algorithms are becoming a hotspot of current research. The monocular vision SLAM method based on the point features is a method for detecting and extracting surrounding environment angular points by taking a monocular camera as a sensor. The extracted characteristic points are used for estimating the pose of the camera and the 3D map by using the algorithm, so that the method has good performance and high calculation speed. Existing relatively sophisticated algorithms are: PTAM, ORB-SLAM2, and the like. But the monocular vision SLAM method based on point features needs to guarantee the texture degree of the surrounding environment, which does not perform well in low-texture scenes. The visual SLAM algorithm based on the line characteristics is proposed for the lack or failure of extraction of the point characteristic algorithm in the low-texture scene. The algorithm utilizes line feature extraction and matching of the surrounding environment to complete the positioning of the robot and the composition of the environment. The line feature has the advantages of describing the edge information of the environment, enriching the geometric information in the environment and well describing the low-texture environment information. Such algorithms do not rely on the point features of the environment, but rather rely entirely on the extraction and matching of line features of the environment. The existing mature technology comprises the following steps: LSD, LCD, etc. However, the visual SLAM algorithm based on the line features only uses the line features, which has great requirements on computing resources, can slow down computing speed, and can lose advantages in texture environments. The dynamic visual SLAM algorithm based on deep learning is based on a neural network technology to identify a dynamic object, then eliminates the dynamic object, and finally matches and patterns the environment outside the dynamic object. The main technical point is that a neural network (such as CNN) is used for identifying the dynamic object from which the characteristic points are extracted, the dynamic object is rejected and is not used, the rest static object is used for calculation, and the complexity of operation is increased and the discrimination of the dynamic object is relatively complicated due to the addition of the deep learning technology. Based on the problems of discomfort and complex operation of the methods in the low-texture scene, the prior art emerges a visual inertial navigation SLAM method for fusing the point-line characteristics, and the positioning accuracy is improved in a point-line fusion mode and the method is suitable for the low-texture scene.
Chinese patent application publication No. CN109579840a discloses a tight phase binocular vision inertial SLAM method of point-line feature fusion, comprising the steps of: determining a transformation relation between a camera coordinate system and an inertial sensor coordinate system: establishing a point money feature and an IMU tracking line hand, solving an initial three-dimensional point line three-dimensional coordinate, after the IMU is initialized, predicting the position of a feature point line by using the IMU, establishing a correct feature data association, and solving pose transformation of a continuous frame by combining the IMU and a point line feature reprojection error item; establishing a point line characteristic and IMU local beam method adjustment thread, and optimizing a point line three-dimensional coordinate, a key frame pose and an IMU state quantity in a local key frame window; and establishing a dotted line characteristic loop detection thread, calculating the score of the bag-of-words model by using the dotted line characteristic weight to detect the loop, and optimizing the global state quantity. The patent application can ensure stability and high precision under the conditions of less number of feature points and rapid movement of a camera, is suitable for low-texture scenes, but weights the dotted lines in the word band model, and errors caused by insufficient extraction of the dotted lines in the early stage can be accumulated to cause larger track errors in the later stage.
Disclosure of Invention
The technical problem to be solved by the invention is that the track error of the visual inertia SLAM method based on the point-line fusion in the prior art is larger, so that the finally generated map has low precision.
The invention solves the technical problems by the following technical means: monocular vision inertial navigation SLAM method based on point-line feature fusion, the method comprises the following steps:
s1: taking an image acquired by a monocular camera and environmental information acquired by an IMU as input information;
s2: carrying out point line feature extraction, point line feature weighting and point line feature matching based on input information, carrying out triangulation measurement while extracting the point line features to obtain the pose of a camera in real time, wherein the IMU is arranged at the center of gravity of a monocular camera, and also obtains the pose of the camera in real time, and fusing the poses of the cameras obtained in the two ways to obtain a camera moving process, wherein a plurality of frame images are obtained in the camera moving process, and image frames successfully matched with the point line features in all frame images are used as key frames;
s3: removing key frames with feature points lower than a first preset value from all key frames, forming a plurality of local maps by camera motion tracks formed by the rest key frames, removing the number of point line features lower than a second preset value from all the local maps, and performing BA optimization on the rest local maps;
s4: initializing a system, repeating the camera track to perform cyclic detection and cyclic correction to obtain a new local map, and performing average value operation on the results of the same local map after the cyclic detection and cyclic correction are finished to obtain a final local map;
s5: and forming a global map by all the final local maps, performing global BA optimization on the global map, and updating the global map through cycle detection and cycle correction to construct a complete map in the environment.
According to the invention, the monocular camera and the IMU are respectively input with prior information to determine the camera moving process, BA optimization is carried out after data with larger errors are removed, then the camera track is repeatedly subjected to cyclic detection and cyclic correction, and the track is averaged to obtain a final track, namely a final local map, and the process has higher precision and smaller track error.
Further, before the step S1, the method further includes an IMU initialization, where the IMU initialization process is:
the parameters acquired by the IMU are put together to form a state vector y k ={s,R wg ,b,v 0:k (wherein s represents a scale factor, R) wg Representing a gravity vector, which is represented in world coordinate system as g=r wg g I Wherein g I =(0,0,G) T G is the gravity magnitude modulus;representing the bias of the accelerometer and gyroscope, which is constant during the initialization phase; v 0:k Representing a non-scale body velocity from a first frame image to a last frame image;
the posterior distribution maximized in IMU initialization is p (y k |I 0: )∝p(I 0: |y k )p(y k ) Wherein I 0:k Measurement information representing the first frame image to the last frame image,representing measurement information of a first frame image to a last frame image which have been measured andp(y k |I 0:k ) Maximized posterior distribution, oc represents the sum of values corresponding to p (I 0:k |y k ) Represents likelihood, p (y) k ) Representing a priori distribution;
the maximum a posteriori estimation of the IMU is expressed as
Wherein I is i-1 Measurement information g representing the i-1 th frame dir Representing the gravity vector, v i-1 A non-scale body velocity, v, representing an i-1 th frame of image i A non-scale body velocity representing an i-th frame image;
taking the negative logarithm of the maximum posterior estimate of the IMU and assuming that the IMU pre-integral and the prior distribution obey gaussian distribution, the IMU final maximum posterior estimate is then expressed as
Wherein r is p Representing an a priori residual term, I i-1,i Representing measurement information of two adjacent frames.
Further, the step S2 includes: and (3) extracting point features by using a FAST corner extraction algorithm, extracting line features by using a line segment detector LSD algorithm, and weighting the point features and weighting the line features higher than the point features in a low-texture scene.
Further, the process of matching the dotted line features in the step S2 is as follows:
point features in adjacent frames of images obtained by monocular cameras are respectively denoted as K i And Q j I, j=1, 2,3 …, K 1 And Q is equal to j Matching the features of the closest point features in the middle distance, and matching K 2 And Q is equal to j Matching the point characteristic features with the closest intermediate distance, and finishing the point characteristic matching by analogy;
line features in adjacent frames of images obtained by the monocular camera are denoted as l, respectively c And l c By the formulaAcquiring a reprojection error, and when the reprojection error of the current line characteristic of the current frame of the image and a certain line characteristic in the adjacent frame is minimum, successfully matching the line characteristic, wherein l is as follows 1 、l 2 Respectively represent straight lines l c X-direction coefficients and y-direction coefficients, +.>Representation l c Respectively to l c Is a distance of (3).
Further, the step S2 of fusing the pose of the camera obtained in the two ways to obtain the camera moving process includes: and acquiring the pose of the camera according to the triangulation measurement to form a first moving track of the camera, acquiring the pose of the camera according to the IMU to form a second moving track of the camera, removing poses of which pose deviations exceed a third preset value at the same moment after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a camera moving process.
Further, repeating the camera trajectory for loop detection and loop correction in step S4 includes: in the process of repeating the camera track, the cyclic detection is mainly to search whether the key frame is similar to the currently detected key frame in a key frame database, if so, the camera track is repositioned by using the closed-loop candidate frame as the closed-loop candidate frame to obtain a new local map, and the cyclic correction is to perform average value operation on the results of the same local map after multiple times of detection and correction in the process of each cyclic detection to obtain a final local map.
The invention also provides a monocular vision inertial navigation SLAM device based on point-line feature fusion, which comprises:
the information acquisition module is used for taking the image acquired by the monocular camera and the environmental information acquired by the IMU as input information;
the key frame acquisition module is used for carrying out point line feature extraction, point line feature weighting and point line feature matching based on input information, carrying out triangulation measurement while carrying out point line feature extraction to acquire the pose of the camera in real time, wherein the IMU is arranged at the center of gravity of the monocular camera, the IMU also acquires the pose of the camera in real time, the pose of the camera obtained in the two modes is fused to obtain a camera moving process, a plurality of frame images are obtained in the camera moving process, and image frames successfully matched with the point line features in all frame images are used as key frames;
the BA optimization module is used for eliminating key frames with feature points lower than a first preset value from all key frames, forming a plurality of local maps by camera motion tracks formed by the rest key frames, eliminating the point line feature numbers lower than a second preset value from all the local maps, and performing BA optimization on the rest local maps;
the detection and correction module is used for initializing the system, repeating the circular detection and the circular correction of the camera track to obtain a new local map, and carrying out average value operation on the results of the same local map after the circular detection and the circular correction are finished, so as to obtain a final local map;
and the global optimization module is used for forming a global map from all final local maps, performing global BA optimization on the global map, and updating the global map through cycle detection and cycle correction to construct a complete map in the environment.
Further, the information acquisition module further includes an IMU initialization, where the IMU initialization process is:
the parameters acquired by the IMU are put together to form a state vector y k ={s,R wg ,b,v 0:k (wherein s represents a scale factor, R) wg Representing a gravity vector, which is represented in world coordinate system as g=r wg g I Wherein G is I =(0,0,G) T G is the gravity magnitude modulus;representing the bias of the accelerometer and gyroscope, which is constant during the initialization phase; v 0:k Representing a non-scale body velocity from a first frame image to a last frame image;
the posterior distribution maximized in IMU initialization is p (y k |I 0:k )∝p(I 0:k |y k )p(y k ) Wherein I 0:k Measurement information representing the first frame image to the last frame image,measurement information representing the first to last frame image which has been measured and +.>p(y k |I 0:k ) Represents the maximized posterior distribution, and c represents the sum of the values corresponding to p (I 0:k |y k ) Represents likelihood, p (y) k ) Representing a priori distribution;
the maximum a posteriori estimation of the IMU is expressed as
Wherein I is i-1 Measurement information g representing the i-1 th frame dir Representing the gravity vector, v i-1 A non-scale body velocity, v, representing an i-1 th frame of image i A non-scale body velocity representing an i-th frame image;
taking the negative logarithm of the maximum posterior estimate of the IMU and assuming that the IMU pre-integral and the prior distribution obey gaussian distribution, the IMU final maximum posterior estimate is then expressed as
Wherein r is p Representing an a priori residual term, I i-1,i Representing measurement information of two adjacent frames.
Further, the key frame acquisition module is further configured to: and (3) extracting point features by using a FAST corner extraction algorithm, extracting line features by using a line segment detector LSD algorithm, and weighting the point features and weighting the line features higher than the point features in a low-texture scene.
Further, the process of matching the dotted line features in the key frame acquisition module is as follows:
point features in adjacent frames of images obtained by monocular cameras are respectively denoted as K i And Q j I, j=1, 2,3 …, K 1 And Q is equal to j Matching the features of the closest point features in the middle distance, and matching K 2 And Q is equal to j Matching the point characteristic features with the closest intermediate distance, and finishing the point characteristic matching by analogy;
line features in adjacent frames of images obtained by the monocular camera are denoted as l, respectively c And l' c By the formulaAcquiring a reprojection error, and when the reprojection error of the current line characteristic of the current frame of the image and a certain line characteristic in the adjacent frame is minimum, successfully matching the line characteristic, wherein l is as follows 1 、l 2 Respectively represent straight lines l c X-direction coefficients and y-direction coefficients, +.>Representing l' c Respectively to l c Is a distance of (3).
Further, the camera moving process obtained by fusing the pose of the camera obtained in the two modes in the key frame obtaining module includes: and acquiring the pose of the camera according to the triangulation measurement to form a first moving track of the camera, acquiring the pose of the camera according to the IMU to form a second moving track of the camera, removing poses of which pose deviations exceed a third preset value at the same moment after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a camera moving process.
Further, the detecting and correcting module for repeating the camera track to perform the cyclic detection and cyclic correction includes: in the process of repeating the camera track, the cyclic detection is mainly to search whether the key frame is similar to the currently detected key frame in a key frame database, if so, the camera track is repositioned by using the closed-loop candidate frame as the closed-loop candidate frame to obtain a new local map, and the cyclic correction is to perform average value operation on the results of the same local map after multiple times of detection and correction in the process of each cyclic detection to obtain a final local map.
The invention has the advantages that: according to the invention, the monocular camera and the IMU are respectively input with prior information to determine the camera moving process, BA optimization is carried out after data with larger errors are removed, then the camera track is repeatedly subjected to cyclic detection and cyclic correction, and the track is averaged to obtain a final track, namely a final local map, and the process has higher precision and smaller track error.
Drawings
FIG. 1 is a flow chart of a monocular vision inertial navigation SLAM method based on point-line feature fusion disclosed in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a relationship between projection lines and matching line segments in a monocular vision inertial navigation SLAM method based on point-line feature fusion according to an embodiment of the present invention;
fig. 3 is a schematic diagram of distances from the end points of the matching line segments to the projection line in the monocular vision inertial navigation SLAM method based on the point-line feature fusion according to the embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described in the following in conjunction with the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1, the monocular vision inertial navigation SLAM method based on the point-line feature fusion comprises the following steps:
s1: taking an image acquired by a monocular camera and environmental information acquired by an IMU as input information; the camera and the IMU are operated in parallel, image information and environment measurement information are respectively extracted, and then the environment information of the camera and the IMU is initialized in a tightly coupled mode. By tightly coupled is meant that the information collected by the two persons is integrated together as input information, rather than being solely as input information. It should be noted that, before the step S1, the method further includes an IMU initialization, where the IMU initialization process is:
the parameters acquired by the IMU are put together to form a state vector y k ={s,R wg ,b,v 0:k (wherein s represents a scale factor, R) wg Representing a gravity vector, which is represented in world coordinate system as g=r wg g I Wherein g I =(0,0,G) T G is the gravity magnitude modulus;representing the bias of the accelerometer and gyroscope, which is constant during the initialization phase; v 0:k Representing a non-scale body velocity from a first frame image to a last frame image;
the posterior distribution maximized in IMU initialization is p (y k |I 0:k )∝p(I 0:k |y k )p(y k ) Wherein I 0:k Measurement information representing the first frame image to the last frame image,measurement information representing the first to last frame image which has been measured and +.>p(y k |I 0:k ) Represents the maximized posterior distribution, and c represents the sum of the values corresponding to p (I 0:k |y k ) Represents likelihood, p (y) k ) Representing a priori distribution;
the maximum a posteriori estimation of the IMU is expressed as
Wherein I is i-1 Measurement information g representing the i-1 th frame dir Representing the gravity vector, v i-1 A non-scale body velocity, v, representing an i-1 th frame of image i And the non-scale organism speed of the ith frame image is represented.
Taking the negative logarithm of the maximum posterior estimate of the IMU and assuming that the IMU pre-integral and the prior distribution obey gaussian distribution, the IMU final maximum posterior estimate is then expressed as
Wherein r is p Representing an a priori residual term, I i-1,i Representing measurement information of two adjacent frames.
S2: carrying out point line feature extraction, point line feature weighting and point line feature matching based on input information, carrying out triangulation measurement while extracting the point line features to obtain the pose of a camera in real time, wherein the IMU is arranged at the center of gravity of a monocular camera, and also obtains the pose of the camera in real time, and fusing the poses of the cameras obtained in the two ways to obtain a camera moving process, wherein a plurality of frame images are obtained in the camera moving process, and image frames successfully matched with the point line features in all frame images are used as key frames;
in this embodiment, the FAST corner extraction algorithm is used to extract the point features, and the line segment detector LSD algorithm is used to extract the line features, where the point features are weighted and the weight of the line features is higher than that of the point features in the low texture scene. The FAST corner extraction algorithm belongs to the prior art, and the general process is as follows:
(1) Selecting a pixel p in the image, assuming its brightness is I p
(2) Setting a threshold T (e.g., I p 10%) of (c).
(3) Taking the pixel p as the center, 16 pixel points on a circle with the radius of 3 are selected.
(4) Assuming that the brightness of N consecutive points on the selected circle is greater than I p +T or less than I p T, then pixel p may be considered a point feature.
(5) The above four parts are cycled, and the same operation is performed for each pixel.
When extracting the point and line features, the basic premise that the point features are not easy to be extracted is considered that the method is carried out under low texture. And weighting the point and line characteristics. The invention is based on the low texture scene, the number of line feature extraction is more than that of point features, and therefore, the line features are taken as the main and the point features are taken as the auxiliary for weighting in the weighting process, the effective point line features extracted from the environment are weighted in the weighting mode, and the map can be initialized according to the line features when the point line features are extracted and the point features are not extracted from continuous frames, so that the ordered operation of the system is not influenced. Therefore, the invention divides the proportion of the points and the line features according to the number of the line features, avoids complex weight distribution calculation and ensures the real-time performance of the system. The threshold value of the number of line features is set to N.
Dot line feature weight w p And w l The distribution is as follows:
N l >N:w p =0.2,w l =0.8
wherein N is l Indicating the number of successful line matches.
The process of the dot line characteristic matching is as follows:
point features in adjacent frames of images obtained by monocular cameras are respectively denoted as K i And Q j I, j=1, 2,3 …, K 1 And Q is equal to j Matching the features of the closest point features in the middle distance, and matching K 2 And Q is equal to j Matching the point characteristic features with the closest intermediate distance, and finishing the point characteristic matching by analogy;
before the line feature matching is performed, the feature lines in the space should be expressed mathematically, so that the computer description is facilitated. Let the homogeneous coordinates of two endpoints of the space straight line L be P 1 =[x 1 ,y 1 ,z 1 ,w 1 ] T And P 2 =[x 2 ,y 2 ,z 2 ,w 2 ] T The pllck matrix is defined asIts non-homogeneous coordinate is +>And->The linear plgram coordinates are:
where L is a 6 x 1 vector, comprising n and v, n being the moment vector of the straight line, v being the direction vector of the straight line, n being perpendicular to the plane determined by the origin and the straight line. w (w) 1 And w 2 Representing the homogeneous factor.
The conversion relationship between the plgram matrix T and the coordinates L is as follows:
wherein n is An antisymmetric matrix of moment vectors n representing straight lines.
Set up a linear transformation matrix H cw It is composed of a rotation matrixAnd translation vector->The composition is formed. L (L) c Is a straight line L in space w Converted from the spatial coordinate system to the coordinates of the camera coordinate system. Handle L c Expressed in terms of the plck coordinates:
wherein,representing an antisymmetric matrix of translation vectors.
Straight line L is set c Projection onto image plane as l c Its plck coordinates are expressed as:
wherein K is l Projection matrix representing straight line, f x 、f y 、c x 、c y Representing the focal length and image center inside the parameters inside the cameras (each camera has a fixed focal length and image center), n c Represents the normal vector in the expression of the pulgram, l 1 、l 2 、l 3 Represents straight line l c Coefficients of l c =l 1 x+l 2 y+l 3 z。
As shown in fig. 2 and 3, line features in adjacent frames of images obtained by the monocular camera are denoted as l, respectively c And l' c By the formulaAcquiring a reprojection error, and when the reprojection error of the current line characteristic of the current frame of the image and a certain line characteristic in the adjacent frame is minimum, successfully matching the line characteristic, wherein l is as follows 1 、l 2 Respectively represent straight lines l c X-direction coefficients and y-direction coefficients, +.>Representing l' c Respectively to l c Is a distance of (3).
The camera moving process obtained by fusing the pose of the camera obtained in the two modes in the step S2 comprises the following steps: and acquiring the pose of the camera according to the triangulation measurement to form a first moving track of the camera, acquiring the pose of the camera according to the IMU to form a second moving track of the camera, removing poses of which pose deviations exceed a third preset value at the same moment after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a camera moving process.
S3: removing key frames with feature points lower than a first preset value from all key frames, forming a plurality of local maps by camera motion tracks formed by the rest key frames, removing the number of point line features lower than a second preset value from all the local maps, and performing BA optimization on the rest local maps; the BA optimization process is the prior art, and the document SLAM fourteen talkback-end 1 uploaded by JInterest 2020 on 1 month 14 days on CSDN blogs discloses the specific BA optimization process as follows:
the world coordinates are converted to camera coordinates using the rotation matrix R and translation matrix t of the camera.
P′=Rp+t=[X′,Y′,Z′]
Will P And (3) casting the coordinate to a normalization plane to obtain a normalization coordinate:
taking the distortion condition of the normalized coordinates into consideration to obtain the original pixel coordinates before de-distortion
Calculating pixel coordinates according to the internal reference model:
the observation data of the camera in the moving process is recorded as:
z=h(x,y)
x refers to the pose of the camera, namely the corresponding transformation matrix T, y refers to the three-dimensional point p, and the observation error is as follows:
e=z-h(T,p)
taking into account observations at other moments. Let z be ij To be in pose T i Site observation road sign p j The resulting data, then the overall cost function is:
this least squares solution corresponds to the simultaneous adjustment of pose and landmark, the so-called BA.
S4: initializing a system, repeating the camera track to perform cyclic detection and cyclic correction to obtain a new local map, and performing average value operation on the results of the same local map after the cyclic detection and cyclic correction are finished to obtain a final local map; wherein repeating the camera trajectory for loop detection and loop correction comprises: in the process of repeating the camera track, the cyclic detection is mainly to search whether the key frame is similar to the currently detected key frame in a key frame database, if so, the camera track is repositioned by using the closed-loop candidate frame as the closed-loop candidate frame to obtain a new local map, and the cyclic correction is to perform average value operation on the results of the same local map after multiple times of detection and correction in the process of each cyclic detection to obtain a final local map.
S5: and forming a global map by all the final local maps, performing global BA optimization on the global map, and updating the global map through cycle detection and cycle correction to construct a complete map in the environment.
According to the technical scheme, the monocular camera and the IMU are respectively input with prior information to determine the camera moving process, BA optimization is performed after data with larger errors are removed, then the camera track is repeated, and the track is subjected to cyclic detection and cyclic correction, so that a final track, namely a final local map, is obtained on average, and the process has higher precision and smaller track errors.
Example 2
Based on embodiment 1 of the present invention, embodiment 2 of the present invention further provides a monocular vision inertial navigation SLAM device based on dotted line feature fusion, the device comprising:
the information acquisition module is used for taking the image acquired by the monocular camera and the environmental information acquired by the IMU as input information;
the key frame acquisition module is used for carrying out point line feature extraction, point line feature weighting and point line feature matching based on input information, carrying out triangulation measurement while carrying out point line feature extraction to acquire the pose of the camera in real time, wherein the IMU is arranged at the center of gravity of the monocular camera, the IMU also acquires the pose of the camera in real time, the pose of the camera obtained in the two modes is fused to obtain a camera moving process, a plurality of frame images are obtained in the camera moving process, and image frames successfully matched with the point line features in all frame images are used as key frames;
the BA optimization module is used for eliminating key frames with feature points lower than a first preset value from all key frames, forming a plurality of local maps by camera motion tracks formed by the rest key frames, eliminating the point line feature numbers lower than a second preset value from all the local maps, and performing BA optimization on the rest local maps;
the detection and correction module is used for initializing the system, repeating the circular detection and the circular correction of the camera track to obtain a new local map, and carrying out average value operation on the results of the same local map after the circular detection and the circular correction are finished, so as to obtain a final local map;
and the global optimization module is used for forming a global map from all final local maps, performing global BA optimization on the global map, and updating the global map through cycle detection and cycle correction to construct a complete map in the environment.
Specifically, the information acquisition module further comprises an IMU initialization, and the IMU initialization process is as follows:
the parameters acquired by the IMU are put together to form a state vector y k ={s,R wg ,b,v 0:k (wherein s represents a scale factor, R) wg Representing a gravity vector, which is represented in world coordinate system as g=r wg g I Wherein G is I =(0,0,G) T G is the gravity magnitude modulus;representing the bias of the accelerometer and gyroscope, which is constant during the initialization phase; v 0:k Representing a non-scale body velocity from a first frame image to a last frame image;
the posterior distribution maximized in IMU initialization is p (y k |I 0:k )∝p(I 0:k |y k )p(y k ) Wherein I 0:k Measurement information representing the first frame image to the last frame image,measurement information representing the first to last frame image which has been measured and +.>p(y k |I 0:k ) Represents the maximized posterior distribution, and c represents the sum of the values corresponding to p (I 0:k |y k ) Represents likelihood, p (y) k ) Representing a priori distribution;
the maximum a posteriori estimation of the IMU is expressed as
Wherein I is i-1 Measurement information g representing the i-1 th frame dir Representing the gravity vector, v i-1 A non-scale body velocity, v, representing an i-1 th frame of image i A non-scale body velocity representing an i-th frame image;
taking the negative logarithm of the maximum posterior estimate of the IMU and assuming that the IMU pre-integral and the prior distribution obey gaussian distribution, the IMU final maximum posterior estimate is then expressed as
Wherein the method comprises the steps of,r p Representing an a priori residual term, I i-1,i Representing measurement information of two adjacent frames.
Specifically, the key frame acquisition module is further configured to: and (3) extracting point features by using a FAST corner extraction algorithm, extracting line features by using a line segment detector LSD algorithm, and weighting the point features and weighting the line features higher than the point features in a low-texture scene.
Specifically, the process of matching the dotted line features in the key frame acquisition module is as follows:
point features in adjacent frames of images obtained by monocular cameras are respectively denoted as K i And Q j I, j=1, 2,3 …, K 1 And Q is equal to j Matching the features of the closest point features in the middle distance, and matching K 2 And Q is equal to j Matching the point characteristic features with the closest intermediate distance, and finishing the point characteristic matching by analogy;
line features in adjacent frames of images obtained by the monocular camera are denoted as l, respectively c And l c By the formulaAcquiring a reprojection error, and when the reprojection error of the current line characteristic of the current frame of the image and a certain line characteristic in the adjacent frame is minimum, successfully matching the line characteristic, wherein l is as follows 1 、l 2 Respectively represent straight lines l c X-direction coefficients and y-direction coefficients, +.>Representation l c Respectively to l c Is a distance of (3).
Specifically, the camera moving process obtained by fusing the pose of the camera obtained in the two modes in the key frame obtaining module includes: and acquiring the pose of the camera according to the triangulation measurement to form a first moving track of the camera, acquiring the pose of the camera according to the IMU to form a second moving track of the camera, removing poses of which pose deviations exceed a third preset value at the same moment after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a camera moving process.
Specifically, the detecting and correcting module repeats the camera track to perform the cyclic detection and cyclic correction, including: in the process of repeating the camera track, the cyclic detection is mainly to search whether the key frame is similar to the currently detected key frame in a key frame database, if so, the camera track is repositioned by using the closed-loop candidate frame as the closed-loop candidate frame to obtain a new local map, and the cyclic correction is to perform average value operation on the results of the same local map after multiple times of detection and correction in the process of each cyclic detection to obtain a final local map.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (7)

1. The monocular vision through SLAM method based on the point-line characteristic fusion is characterized by comprising the following steps of:
s1: taking an image acquired by a monocular camera and environmental information acquired by an IMU as input information;
s2: carrying out point line feature extraction, point line feature weighting and point line feature matching based on input information, carrying out triangulation measurement while extracting the point line features to obtain the pose of a camera in real time, wherein the IMU is arranged at the center of gravity of a monocular camera, and also obtains the pose of the camera in real time, and fusing the poses of the cameras obtained in the two ways to obtain a camera moving process, wherein a plurality of frame images are obtained in the camera moving process, and image frames successfully matched with the point line features in all frame images are used as key frames; the process of matching the dotted line features in the step S2 is as follows:
point features in adjacent frames of images obtained by monocular camerasRespectively denoted as K i And Q j I, j=1, 2,3 …, K 1 And Q is equal to j Matching the features of the closest point features in the middle distance, and matching K 2 And Q is equal to j Matching the point characteristic features with the closest intermediate distance, and finishing the point characteristic matching by analogy;
line features in adjacent frames of images obtained by the monocular camera are denoted as l, respectively c And l c By the formulaAcquiring a reprojection error, and when the reprojection error of the current line characteristic of the current frame of the image and a certain line characteristic in the adjacent frame is minimum, successfully matching the line characteristic, wherein l is as follows 1 、l 2 Respectively represent straight lines l c X-direction coefficients and y-direction coefficients, +.>Representation l c Respectively to l c Is a distance of (2);
the camera moving process obtained by fusing the pose of the camera obtained in the two modes in the step S2 comprises the following steps: acquiring the pose of the camera according to triangulation measurement to form a first moving track of the camera, acquiring the pose of the camera according to the IMU to form a second moving track of the camera, eliminating the pose of which the pose deviation exceeds a third preset value at the same moment after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a camera moving process;
s3: removing key frames with feature points lower than a first preset value from all key frames, forming a plurality of local maps by camera motion tracks formed by the rest key frames, removing the number of point line features lower than a second preset value from all the local maps, and performing BA optimization on the rest local maps;
s4: initializing a system, repeating the camera track to perform cyclic detection and cyclic correction to obtain a new local map, and performing average value operation on the results of the same local map after the cyclic detection and cyclic correction are finished to obtain a final local map;
s5: and forming a global map by all the final local maps, performing global BA optimization on the global map, and updating the global map through cycle detection and cycle correction to construct a complete map in the environment.
2. The monocular vision through-guide SLAM method based on the point-line feature fusion according to claim 1, wherein the step S1 is preceded by an IMU initialization, and the procedure of the IMU initialization is as follows:
the parameters acquired by the IMU are put together to form a state vector y k ={s,R wg ,b,v 0:k (wherein s represents a scale factor, R) wg Representing a gravity vector, which is represented in world coordinate system as g=r wg g I Wherein g I =(0,0,G) T G is the gravity magnitude modulus;representing the bias of the accelerometer and gyroscope, which is constant during the initialization phase; v 0:k Representing a non-scale body velocity from a first frame image to a last frame image;
the posterior distribution maximized in IMU initialization is p (y k |I 0:k )∝p(I 0:k |y k )p(y k ) Wherein I 0:k Measurement information representing the first frame image to the last frame image,representing measurement information of a first frame image to a last frame image which have been measured andp(y k |I 0:k ) Represents the maximized posterior distribution, and c represents the sum of the values corresponding to p (I 0:k |y k ) Represents likelihood, p (y) k ) Representing a priori distribution;
the maximum a posteriori estimation of the IMU is expressed as
Wherein l i-1 Measurement information g representing the i-1 th frame dir Representing the gravity vector, v i-1 A non-scale body velocity, v, representing an i-1 th frame of image i A non-scale body velocity representing an i-th frame image;
taking the negative logarithm of the maximum posterior estimate of the IMU and assuming that the IMU pre-integral and the prior distribution obey gaussian distribution, the IMU final maximum posterior estimate is then expressed as
Wherein r is p Representing an a priori residual term, I i-1,i Representing measurement information of two adjacent frames.
3. The monocular vision through-guide SLAM method based on the dotted line feature fusion according to claim 1, wherein the step S2 includes: and (3) extracting point features by using a FAST corner extraction algorithm, extracting line features by using a line segment detector LSD algorithm, and weighting the point features and weighting the line features higher than the point features in a low-texture scene.
4. The monocular vision through-guide SLAM method of claim 1, wherein repeating the camera trajectory for loop detection and loop correction in step S4 comprises: in the process of repeating the camera track, the cyclic detection is mainly to search whether the key frame is similar to the currently detected key frame in a key frame database, if so, the camera track is repositioned by using the closed-loop candidate frame as the closed-loop candidate frame to obtain a new local map, and the cyclic correction is to perform average value operation on the results of the same local map after multiple times of detection and correction in the process of each cyclic detection to obtain a final local map.
5. Monocular vision through-guide SLAM device based on dotted line feature fusion, characterized in that the device includes:
the information acquisition module is used for taking the image acquired by the monocular camera and the environmental information acquired by the IMU as input information;
the key frame acquisition module is used for carrying out point line feature extraction, point line feature weighting and point line feature matching based on input information, carrying out triangulation measurement while carrying out point line feature extraction to acquire the pose of the camera in real time, wherein the IMU is arranged at the center of gravity of the monocular camera, the IMU also acquires the pose of the camera in real time, the pose of the camera obtained in the two modes is fused to obtain a camera moving process, a plurality of frame images are obtained in the camera moving process, and image frames successfully matched with the point line features in all frame images are used as key frames; the process of the dot line feature matching in the key frame acquisition module is as follows:
point features in adjacent frames of images obtained by monocular cameras are respectively denoted as K i And Q j I, j=1, 2,3 …, K 1 And Q is equal to j Matching the features of the closest point features in the middle distance, and matching K 2 And Q is equal to j Matching the point characteristic features with the closest intermediate distance, and finishing the point characteristic matching by analogy;
line features in adjacent frames of images obtained by the monocular camera are denoted as l, respectively c And l c By the formulaAcquiring a reprojection error, and when the reprojection error of the current line characteristic of the current frame of the image and a certain line characteristic in the adjacent frame is minimum, successfully matching the line characteristic, wherein l is as follows 1 、l 2 Respectively represent straight lines l c X-direction coefficients and y-direction coefficients, +.>Representation l c Two of (2)Each end point is respectively to l c Is a distance of (2);
the camera moving process obtained by fusing the pose of the camera obtained in the two modes in the key frame obtaining module comprises the following steps: acquiring the pose of the camera according to triangulation measurement to form a first moving track of the camera, acquiring the pose of the camera according to the IMU to form a second moving track of the camera, eliminating the pose of which the pose deviation exceeds a third preset value at the same moment after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a camera moving process;
the BA optimization module is used for eliminating key frames with feature points lower than a first preset value from all key frames, forming a plurality of local maps by camera motion tracks formed by the rest key frames, eliminating the point line feature numbers lower than a second preset value from all the local maps, and performing BA optimization on the rest local maps;
the detection and correction module is used for initializing the system, repeating the circular detection and the circular correction of the camera track to obtain a new local map, and carrying out average value operation on the results of the same local map after the circular detection and the circular correction are finished, so as to obtain a final local map;
and the global optimization module is used for forming a global map from all final local maps, performing global BA optimization on the global map, and updating the global map through cycle detection and cycle correction to construct a complete map in the environment.
6. The monocular vision through-guide SLAM device based on dotted line feature fusion of claim 5, wherein the information acquisition module is preceded by an IMU initialization, the procedure of IMU initialization is:
the parameters acquired by the IMU are put together to form a state vector y k ={s,R wg ,b,v 0:k (wherein s represents a scale factor, R) wg Representing a gravity vector, which is represented in world coordinate system as g=r wg g I Wherein g I =(0,0,G) T G is the gravityA modulus value;representing the bias of the accelerometer and gyroscope, which is constant during the initialization phase; v 0:k Representing a non-scale body velocity from a first frame image to a last frame image;
the posterior distribution maximized in IMU initialization is p (y k |I 0:k )∝p(I 0:k |y k )p(y k ) Wherein I 0:k Measurement information representing the first frame image to the last frame image,representing measurement information of a first frame image to a last frame image which have been measured andp(y k |I 0:k ) Represents the maximized posterior distribution, and c represents the sum of the values corresponding to p (I 0:k |y k ) Represents likelihood, p (y) k ) Representing a priori distribution;
the maximum a posteriori estimation of the IMU is expressed as
Wherein I is i-1 Measurement information g representing the i-1 th frame dir Representing the gravity vector, v i-1 A non-scale body velocity, v, representing an i-1 th frame of image i A non-scale body velocity representing an i-th frame image;
taking the negative logarithm of the maximum posterior estimate of the IMU and assuming that the IMU pre-integral and the prior distribution obey gaussian distribution, the IMU final maximum posterior estimate is then expressed as
Wherein r is p Representing an a priori residual term, I i-1,i Representing measurement information of two adjacent frames.
7. The monocular vision through-screen SLAM device of claim 5, wherein the keyframe acquisition module is further configured to: and (3) extracting point features by using a FAST corner extraction algorithm, extracting line features by using a line segment detector LSD algorithm, and weighting the point features and weighting the line features higher than the point features in a low-texture scene.
CN202110873290.0A 2021-07-30 2021-07-30 Monocular vision inertial navigation SLAM method and device based on point-line feature fusion Active CN113720323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110873290.0A CN113720323B (en) 2021-07-30 2021-07-30 Monocular vision inertial navigation SLAM method and device based on point-line feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110873290.0A CN113720323B (en) 2021-07-30 2021-07-30 Monocular vision inertial navigation SLAM method and device based on point-line feature fusion

Publications (2)

Publication Number Publication Date
CN113720323A CN113720323A (en) 2021-11-30
CN113720323B true CN113720323B (en) 2024-01-23

Family

ID=78674538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110873290.0A Active CN113720323B (en) 2021-07-30 2021-07-30 Monocular vision inertial navigation SLAM method and device based on point-line feature fusion

Country Status (1)

Country Link
CN (1) CN113720323B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665540A (en) * 2018-03-16 2018-10-16 浙江工业大学 Robot localization based on binocular vision feature and IMU information and map structuring system
CN110782494A (en) * 2019-10-16 2020-02-11 北京工业大学 Visual SLAM method based on point-line fusion
WO2020155616A1 (en) * 2019-01-29 2020-08-06 浙江省北大信息技术高等研究院 Digital retina-based photographing device positioning method
CN112419497A (en) * 2020-11-13 2021-02-26 天津大学 Monocular vision-based SLAM method combining feature method and direct method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665540A (en) * 2018-03-16 2018-10-16 浙江工业大学 Robot localization based on binocular vision feature and IMU information and map structuring system
WO2020155616A1 (en) * 2019-01-29 2020-08-06 浙江省北大信息技术高等研究院 Digital retina-based photographing device positioning method
CN110782494A (en) * 2019-10-16 2020-02-11 北京工业大学 Visual SLAM method based on point-line fusion
CN112419497A (en) * 2020-11-13 2021-02-26 天津大学 Monocular vision-based SLAM method combining feature method and direct method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于Vision-IMU的机器人同时定位与地图创建算法;姚二亮;张合新;张国良;徐慧;赵欣;;仪器仪表学报(第04期);全文 *

Also Published As

Publication number Publication date
CN113720323A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN109307508B (en) Panoramic inertial navigation SLAM method based on multiple key frames
CN110223348B (en) Robot scene self-adaptive pose estimation method based on RGB-D camera
CN109166149B (en) Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU
CN107392964B (en) The indoor SLAM method combined based on indoor characteristic point and structure lines
CN111739063A (en) Electric power inspection robot positioning method based on multi-sensor fusion
CN107240129A (en) Object and indoor small scene based on RGB D camera datas recover and modeling method
CN112734841B (en) Method for realizing positioning by using wheel type odometer-IMU and monocular camera
CN106056664A (en) Real-time three-dimensional scene reconstruction system and method based on inertia and depth vision
Yin et al. Dynam-SLAM: An accurate, robust stereo visual-inertial SLAM method in dynamic environments
CN111998862A (en) Dense binocular SLAM method based on BNN
CN107563323A (en) A kind of video human face characteristic point positioning method
CN111623773B (en) Target positioning method and device based on fisheye vision and inertial measurement
CN114529576A (en) RGBD and IMU hybrid tracking registration method based on sliding window optimization
CN116468786B (en) Semantic SLAM method based on point-line combination and oriented to dynamic environment
CN116449384A (en) Radar inertial tight coupling positioning mapping method based on solid-state laser radar
CN111882602A (en) Visual odometer implementation method based on ORB feature points and GMS matching filter
CN112179373A (en) Measuring method of visual odometer and visual odometer
CN112767546A (en) Binocular image-based visual map generation method for mobile robot
CN116772844A (en) Navigation method of visual inertial indoor robot based on dynamic environment
CN109544632B (en) Semantic SLAM object association method based on hierarchical topic model
CN112731503B (en) Pose estimation method and system based on front end tight coupling
CN112907633A (en) Dynamic characteristic point identification method and application thereof
CN116147618B (en) Real-time state sensing method and system suitable for dynamic environment
CN113720323B (en) Monocular vision inertial navigation SLAM method and device based on point-line feature fusion
CN112767481B (en) High-precision positioning and mapping method based on visual edge features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant