CN112381890B

CN112381890B - RGB-D vision SLAM method based on dotted line characteristics

Info

Publication number: CN112381890B
Application number: CN202011358438.9A
Authority: CN
Inventors: 方志军; 卢俊鑫; 高永彬; 杨淑群; 方荣辉
Original assignee: Shanghai University of Engineering Science
Current assignee: Shanghai University of Engineering Science
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2022-08-02
Anticipated expiration: 2040-11-27
Also published as: CN112381890A

Abstract

The invention discloses an RGB-D vision SLAM method based on dotted line characteristics, which comprises the following steps: s1, calibrating internal parameters of the RGB-D camera; s2, extracting line features; s3, acquiring an initial pose transformation position of the camera; s4, projecting the key frame to the current frame; s5, forming an error function of the visual odometer module; s6, performing iterative optimization of the initial pose transformation of the current frame camera and solving the pose of the current frame camera; s7, judging key frames; s8, local BA optimization; s9, detecting a loop; s10, repositioning; s11, constructing a map; and S12, outputting the result. According to the SLAM method provided by the invention, the RGB-D camera is used as image acquisition equipment, the points and edge lines of objects in a scene are used as positioning features to realize SLAM, and the method has better accuracy in low-semantic low-texture challenging scenes.

Description

RGB-D vision SLAM method based on dotted line characteristics

Technical Field

The invention relates to an RGB-D visual SLAM method based on dotted line characteristics, and belongs to the technical field of visual SLAM (synchronous positioning and map construction).

Background

SLAM (simultaneous localization and mapping) can be summarized as: a certain mobile device (such as a robot, an unmanned aerial vehicle, a mobile phone and the like) starts from an unknown place in an unknown environment, the position, the attitude and the motion track of the mobile device are observed and positioned through a sensor (such as a laser radar, a camera and the like) in the motion process, and incremental map construction is carried out according to the position of the mobile device, so that the purposes of simultaneous positioning and map construction are achieved. The development of SLAM technology has been more than 30 years old today, and the related technical fields are numerous, and the SLAM technology has become a popular research direction in the fields of robots and computer vision.

Generally speaking, SLAM systems typically include a variety of sensors and a variety of functional modules. However, the currently common robot SLAM system generally has two forms, which are distinguished by core function modules: lidar (laser SLAM) based SLAM and vision based SLAM (Visual SLAM or VSLAM). The vision-based vision SLAM is a hotspot of research in various scientific research institutes at present, and takes a vision sensor as a carrier to obtain characteristics such as geometry, texture, color and the like in an environment to model, solve the motion state of a camera in the real world, and finally realize autonomous navigation and mapping of a robot in an unknown environment.

Therefore, the extraction of visual features becomes the key of visual SLAM, and good visual features should have rotation invariance and scale invariance. In the environment with abundant semantic information, little illumination change and no dynamic object, points, lines, surfaces and the like can be extracted to complete self high-precision positioning. However, in the current visual SLAM, the precision of the SLAM algorithm is suddenly reduced in low-semantic, large-illumination-variation and dynamic environments such as white walls, floors and personnel activity scenes.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to provide an RGB-D visual SLAM method based on dotted line characteristics so as to solve the problems of low semantic meaning, large illumination change and low precision in a dynamic environment in the conventional visual SLAM technology.

In order to achieve the purpose, the invention adopts the following technical scheme:

an RGB-D visual SLAM method based on dotted line characteristics comprises the following steps:

s1, calibrating internal parameters of the RGB-D camera: the method comprises the following steps of taking an RGB-D camera as image acquisition equipment, and carrying out internal reference calibration on the RGB-D camera, wherein the internal reference of the RGB-D camera comprises a principal point, a focal length and a distortion coefficient of the camera;

s2, extracting point and line characteristics: performing edge detection and straight line fitting on an image frame acquired by an RGB-D camera at the current moment, screening and sampling line points of the straight line obtained by fitting, and forming a point set by the acquired points;

s3, acquiring an initial pose transformation position of the camera: initializing the transformation pose of different image frames acquired by an RGB-D camera to acquire the initial transformation pose of the camera;

s4, straight line projection of the key frame to the current frame: projecting the straight line on the key frame obtained in the step S2 to the current frame according to the initial pose of the camera obtained in the step S3;

s5, constructing an adaptive error based on the local gradient of the line segment to form an error function of the visual odometer module: obtaining projection straight lines on the current frame according to the step S4, obtaining the adaptation degree value of each straight line on the current frame by using the relation between the projection straight lines and the projection straight line neighborhood image data, and accumulating the adaptation degree value of each projection straight line to form an error function of a Visual Odometer (VO) module;

s6, iterative optimization of the initial pose transformation of the current frame camera and pose solution of the current frame camera: according to the error function finally formed in the step S5, iterative optimization of the initial transformation pose of the camera is carried out by utilizing a nonlinear error function optimization method, and the transformation poses of the current frame and the key frame obtained by optimization are used

Left-handed key frame pose xi _k Solving the pose xi of the current frame camera _c Thereby acquiring the specific position of the camera at the current frame moment and realizing the positioning of the camera;

s7, judging key frames: judging whether the current frame is a key frame or not based on the straight line adaptation degree value of the current frame obtained in the step S5, if the current frame is judged to be the key frame, replacing and storing the current frame as a new key frame, inserting the new key frame into a rear-end sliding window, and performing local BA optimization; if the current frame is judged to be a non-key frame, the relevant operation of the key frame is not carried out, and the step S2 is returned;

s8, local BA optimization: when the current frame is judged to be the key frame in the step S7, inserting the current frame into a rear local BA optimization sliding window, and constructing a common-view local BA optimization based on the sliding window by using the point and line characteristics extracted in the step S2 and the camera transformation pose solved in the step S6, and all the key frames in the sliding window and the point and line characteristics which belong to the key frames;

s9, loop detection: performing loop detection and global graph optimization on the new key frame obtained in step S8 and all existing key frames, specifically: performing loop detection on each newly inserted key frame and all previous key frames based on a dictionary model, and if the loop frame is detected, performing graph optimization on all key frames between the new key frame and the loop frame by using an error function constructed by the visual odometry module in the step S5;

s10, repositioning: when the difference between the camera transformation pose solved by the error function of the visual odometer module in the step S6 and the set threshold value is too large, the pose solving of the current frame camera is judged to be failed, and repositioning is carried out; in the repositioning process, loop detection processing in the step S9 is firstly carried out, loop frames corresponding to the current frame are searched, then the step S3 is skipped to for estimating the initial transformation pose, and corresponding subsequent operations are continuously and sequentially executed;

s11, map construction: solving the pose of each key frame and the point and line characteristics obtained in the step to construct a sparse map;

s12, outputting the result: and (4) storing the pose, drawing the motion path of the camera, storing and synchronously displaying the map in real time.

Preferably, in step S1, the RGB-D camera is calibrated with an internal reference K by using a zhangnyou camera calibration method.

Further preferably, in step S1, the calibration of the internal reference K of the RGB-D camera includes the following steps:

s11, taking the RGB-D camera as an image acquisition device, and shooting a calibration board (checkerboard) with fixed size at different angles by using the RGB-D camera to obtain image data required by calibration;

and S12, calculating camera internal parameter K of the acquired image data by using a Zhang Zhengyou camera calibration method, and storing the internal parameter coefficient.

Preferably, in the step S2, the edge detection and the straight line fitting are performed on the video image frame acquired by the RGB-D camera, and the preprocessing operations including denoising and edge enhancement are performed on the video image frame acquired by the RGB-D camera; then, edge extraction is carried out on the preprocessed image by using an edge detector; and finally, fitting the extracted edges to obtain a required straight line, screening the fitted straight line and sampling line points, wherein the collected points form a point set.

Further preferably, in step S2, the extracting of the point and line features of the image frame acquired by the RGB-D camera at the current time includes the following steps:

s21, preprocessing operations including denoising and edge enhancement are carried out on a video image frame acquired by the RGB-D camera, and then an edge is extracted from the image by using a Canny edge extractor;

s22, directly collecting collinear point clusters from the edges extracted in the step S21 by using an edge linking and segmentation technology, and then fitting initial line segments in the point clusters based on a least square fitting method;

s23, further extending and combining the initial line segments fitted in the step S22 to generate longer and more complete line segments, fitting a required straight line, screening the fitted straight line and sampling line points, and enabling the collected points to form a point set.

Preferably, the acquiring of the initial pose of the camera in step S3 includes the following steps:

s31, firstly, judging that the image frame acquired by the current RGB-D camera belongs to the frame, then respectively executing the camera pose initialization scheme in the step S32, the step S33 or the step S34 for different image frames according to the judgment result, and acquiring the initial transformation pose of the corresponding image frame (the iterative convergence speed of the subsequent error function can be accelerated to a certain extent, and the method is prevented from falling into a local optimal value);

s32, if the current frame image is the first frame image, the pose is transformed and initialized by using a unit array I;

s33, if the current frame is a second frame image or a repositioning image frame, solving the initial transformation pose of the camera by a perspective N-point projection (PNP) method in geometric transformation by using the point set acquired in the step S2;

and S34, if the current frame is not the first image frame, the second image frame and the repositioning image frame, initializing the transformation pose based on the continuous motion model.

Further preferably, step S33 includes the following operations:

s331, for the point sets of the current frame and the key frame obtained in the step S2, respectively defining a region with a fixed size around each feature point position, and calculating the gradient size and direction of each pixel point in the region to form a feature descriptor corresponding to each key point;

s332, calculating a descriptor distance between each feature point of the key frame and a feature point of the current frame, sequentially sorting according to the distance of the descriptors, extracting two points with the nearest descriptor distance as a point pair, and determining a best matching point;

s333, screening out error point pairs by using RANSAC (random sample consensus) to the obtained best matching point pairs;

and S334, solving the initial transformation pose of the camera by using the matching point pair obtained in the S333 and through a perspective N-point projection (PNP) method in geometric transformation.

Further preferably, step S334 includes the following calculation:

s3341, projecting the matching point on the key frame to the three-dimensional space according to the posture of the key frame to obtain a three-dimensional space coordinate point P _i ＝(X _i ，Y _i ，Z _i ) ^T And solving the camera internal parameter K and the three-dimensional space point P obtained in the step 1 _i Corresponding two-dimensional point p on the current frame _i ＝(u _i ，v _i ) ^T Solving the transformation pose of the camera between two frames

R∈SO(3),t∈R ³ The lie algebraic form of T is xi; wherein the current frame pixel point p _i And three-dimensional space point P _i The transformation relationship between the two is shown in formula (1):

in the formula (1), s _i The xi ^ is an antisymmetric matrix of xi which is a scale factor of the current image frame;

s3342, because the transformation pose of the camera is unknown and noise exists, the equation (1) has an error, therefore, the errors are summed to construct a least square problem, and then the error function is minimized to find the best initial transformation pose of the camera, as shown in the following equation (2):

in the formula (2), the first and second groups,

i.e. the initial transformation of the camera, error term, obtained by solving

Referred to as reprojection error.

Preferably, in step S34, initializing the camera transformation matrix by using the continuous motion model, specifically: the current frame F _c (newly incoming image frame, which the present invention subsequently refers to as current frame) of a previous frame F _c-1 Corresponding key frame F _k (Key frame means that a frame is selected from a series of local common frames to be used as a representative of a local frame, and local information is recorded, and the key frame has remarkable characteristic information)

Multiplication by the current frame F _c First two frames F _c-1 And F _c-2 Position xi of transformation between _c-2,c-1 Obtain the current frame F _c Initial pose transformation of

Namely, as shown in formula (3):

further preferably, step S34 includes the following operations:

s341, solving current frame F _c Its previous frame F _c-1 And its key frame F _k Change the pose between

S342, solving the current frame F _c First two frames F _c-1 And F _c-2 Position xi of transformation between _c-2,c-1 ；

S343, solving the current frame F _c Initial pose transformation of

Solving the following equation (3):

preferably, in step S5, constructing an adaptive error based on the local gradient of the line segment to form an error function of the visual odometer module, includes the following steps:

s51, obtaining the projection straight lines on the current frame according to the step S4, and calculating the direction alpha of each projection straight line;

s52, sampling each projection straight line at equal intervals, wherein the number of the sampled points of each projection straight line satisfies the following formula:

wherein: n is the total number of sampling points of each projection line, L is the corresponding length of each projection line, and L is the total number of sampling points of each projection line _s Represents the length of the shortest projected line in the current frame, L _l The length of the longest projection straight line in the current frame is taken as the length of the longest projection straight line in the current frame;

s53, defining a neighborhood with fixed size by taking each sampling point as a center, and calculating gradient amplitudes and gradient directions corresponding to all pixel points in the neighborhood;

s54, screening neighborhood points according to the gradient amplitude and the gradient direction of the pixel points in the neighborhood of each sampling point obtained in the step S53, and when the gradient amplitude of the neighborhood points is lower than a set threshold value, considering that the neighborhood points do not contribute to straight lines and abandoning the points, or else, keeping the points;

s55, calculating the distance from each neighborhood point screened in the step S54 to the subordinate projection straight line thereof, and forming the weight value of each neighborhood point by setting that the distance meets the standard Gaussian normal distribution;

s56, subtracting the direction of the straight line from the gradient direction of all the pixel points in the neighborhood of each sampling point to obtain a direction difference value, if the direction difference value is larger than a set threshold value, further screening and rejecting the neighborhood points, and if the direction difference value is in the range of the set threshold value, multiplying the direction difference value by the gradient amplitude of the pixel points in the neighborhood to obtain an adaptive contribution value of each neighborhood point;

s57, multiplying the direction difference value calculated by all sampling points of each projection straight line solved in the step S56 by the weight value of the step S55, accumulating and summing to form the adaptation degree value of the straight line and the neighborhood image data;

and S58, accumulating the fit degree values of the projection straight lines obtained in the step S57 to form an error function of a Visual Odometer (VO) module.

Preferably, in step S7, the determination of whether the current frame is the key frame is performed based on the linear fitness value in step S5, specifically: and based on the error function formed by accumulating the adaptation degree values of each projection straight line obtained in the step S6, calculating the adaptation rate of the current frame according to an image frame adaptation rate calculation formula, and if the calculated adaptation rate of the current frame is higher than a preset threshold, determining that the current frame is a key frame.

Preferably, in step S8, the local BA optimization includes the following steps:

s81, when the current frame is judged to be the latest key frame in the step S7, point and line features extracted in the step S2 are utilized, the camera transformation pose obtained in the step S6 is utilized, all key frames in the sliding window and the point and line features to which the key frames belong construct a co-view local BA optimization based on the sliding window, then pose nodes in the co-view (co-visibility graph) are the corresponding poses of all key frames contained in the current sliding window, and all the point and line landmarks are the co-view landmarks of all the key frames in the sliding window; calculating a co-visual landmark between two key frames, and adopting landmark point matching based on an Oriented Fast and Rotated Brief (ORB) characteristic and Line characteristic matching based on a Line Band Descriptor (LBD) Descriptor;

s82, because the size of the sliding window is fixed, if the sliding window is full before the current new key frame is inserted, the marginalization of the old key frame and the road mark point is carried out (concretely, the marginalization of the old key frame, the old road mark and the road mark is carried out by using Schur supplement);

s83, re-projection errors of the point and line characteristics in the common view are solved respectively, an error function of rear-end local BA optimization is constructed by combining the two errors, and then iteration optimization is carried out by adopting a Levenberg-Marquardt (L-M) method, so that the pose of the key frame in the common view and the optimization of the common-view landmark are realized;

preferably, in step S9, the loop detection and global graph optimization for all the key frames obtained in step S8 includes the following steps:

s91, based on a dictionary model, firstly, finding out a corresponding word type in a dictionary K-ary tree according to descriptors corresponding to all points and line features of the current key frame, and calculating the weight corresponding to each word to obtain a word vector of the current key frame;

s92, calculating the preselected similarity or L1 norm between the word vector of the current key frame and the word vectors of other key frames to obtain the similarity between frames, and selecting the key frame with the highest similarity and exceeding a preset threshold value as the loop frame of the current key frame;

s93, solving the transformation pose between the loop frame and the current key frame by adopting the error function in the step S5 according to the loop frame obtained by calculation in the step S92;

and S94, constructing a pose graph of all the key frames in the loop according to the transformed poses obtained by the solution of the step S93, wherein the nodes of the pose graph are the poses of each key frame, and the edges are the co-sight landmark points among the key frames, so that the global pose graph optimization is realized.

Compared with the prior art, the invention has the following remarkable beneficial effects:

the invention provides an RGB-D vision SLAM method based on dotted line characteristics, which takes an RGB-D camera as image acquisition equipment, utilizes points and edge lines of an object in a scene as positioning characteristics to realize real-time positioning and map construction (SLAM), firstly carries out vision odometry processing on images acquired by the RGB-D camera (comprising the steps of internal reference calibration of the RGB-D camera, point and line characteristic extraction, acquisition of initial transformation pose of the camera, linear projection of a key frame to a current frame, construction of adaptive errors based on line segment local gradients, formation of an error function of a vision odometry module, iterative solution of the pose of the camera and the like), then, the local BA optimization processing is carried out on the key frames in the camera image frame, then the loop detection is further carried out on the key frames subjected to the local BA optimization processing, then, global graph optimization is executed, and a relocation module is further arranged to perform failed restart on the visual odometer; the whole SLAM method has real-time performance, robustness and high efficiency, fully combines points, straight lines and image data, reduces drift errors, improves the accuracy of the visual SLAM in low-semantic, low-texture and other challenging scenes, and achieves remarkable progress and unexpected effects compared with the prior art.

Drawings

FIG. 1 is a flow chart of an RGB-D visual SLAM method based on dotted line features provided by the present invention;

FIG. 2 is a flow chart of the visual odometer module of the present invention;

FIG. 3 is an edge map extracted from an image frame according to the present invention;

FIG. 4 is a straight line obtained by fitting an edge map of an image frame according to the present invention;

FIG. 5 is a schematic diagram of points and lines extracted from an edge map according to the present invention;

FIG. 6 is a schematic diagram of the transformation of point and line features in an image and a three-dimensional world according to the present invention;

FIG. 7 is a schematic diagram of an image neighborhood generated by sampling points of each projection line in the present invention;

FIG. 8 is a diagram of a camera positioning trace result of the present invention;

fig. 9 is a three-dimensional reconstruction result diagram of the present invention.

Detailed Description

The technical scheme of the invention is further detailed and completely explained by combining specific embodiments.

Example 1

As shown in fig. 1 and 2, the RGB-D visual SLAM method based on dotted line features provided in the present invention includes the following steps:

s1, calibrating internal parameters of the RGB-D camera: the method comprises the following steps of taking an RGB-D camera as image acquisition equipment, and carrying out internal reference calibration on the RGB-D camera, wherein the internal reference of the RGB-D camera comprises a principal point, a focal length and a distortion coefficient of the camera:

s11, taking the RGB-D camera as an image acquisition device, and shooting a calibration board (checkerboard) with fixed size at different angles by using the RGB-D camera to obtain image data required by calibration; in the invention, an RGB-D camera is adopted as an image acquisition device, the RGB-D camera can provide a pair of RGB images and depth images after registration in real time, and the depth images can provide accurate depth information for converting subsequent pixel coordinates into a three-dimensional space;

and S12, calculating camera internal parameter K of the acquired image data by using Zhangyingyou camera calibration method, and storing the internal parameter coefficient.

S2, point and line feature extraction: the image frame that present moment RGB-D camera acquireed carries out edge detection and straight line fitting to further carry out screening and line point sampling to the straight line that the fitting obtained, the point that gathers constitutes the point set:

s21, preprocessing operations including denoising and edge enhancement are firstly carried out on a video image frame acquired by an RGB-D camera, then a Canny edge extractor is utilized, low and high thresholds of a traditional Canny operator are set in a self-adaptive mode, edge extraction is carried out on the image, and an extracted edge graph is shown in the attached drawing 3;

s22, directly collecting collinear point clusters from the edges extracted in the step S21 by using an edge linking and segmentation technology, and then fitting initial line segments in the point clusters based on a least square fitting method; the traditional RGB-D vision SLAM method based on straight lines mainly adopts the existing LSD line segment extractor or Hough straight line detector to extract straight line characteristics, but the invention uses the more robust characteristics of edges to fit straight lines, and can effectively solve the problem that the traditional SLAM has poor extraction effect on characteristic points and characteristic lines under the environments of low semantic, low texture, changed customs and the like;

s23, further extending and combining the initial line segments fitted in the step S22 to generate longer and more complete line segments, so as to fit a required straight line, and finally fitting to obtain the straight line as shown in the attached figure 4;

s24, collecting end point characteristics of the straight lines obtained by fitting in the S23, wherein each straight line can obtain two point characteristics, all straight line characteristic end points are screened to obtain a characteristic point set of the text, and the extracted point set characteristics are shown in figure 5;

s3, acquiring an initial pose transformation position of the camera:

s33, if the current frame is the second frame image or the repositioned image frame, using the point set collected in step S2 to solve the initial transformation pose of the camera by a perspective N-point projection (PNP) method in geometric transformation, where the projection transformation of the point and line features between the image frames is as shown in fig. 6:

s332, calculating a descriptor distance between each feature point of the key frame and a feature point of the current frame, sequentially sorting according to the distance of the descriptors, extracting two points with the nearest descriptor distance as a point pair, and determining a best matching point; in the embodiment, the descriptors are sequentially sorted according to the distance, and the top sixty percent is taken as the best matching point;

s333, screening out error points of the obtained best matching point pairs by using RANSAC;

s334, firstly, the matching points on the key frame obtained in the S323 are projected to a three-dimensional space according to the posture of the key frame to obtain a three-dimensional space coordinate point P _i ＝(X _i ,Y _i ,Z _i ) ^T And solving the camera internal parameter K and the three-dimensional space point P obtained in the step 1 _i Corresponding two-dimensional point p on the current frame _i ＝(u _i ，v _i ) ^T Solving the transformation pose of the camera between two frames

R∈ SO(3),t∈R ³ The lie algebraic form of T is xi; wherein the current frame pixel point p _i And three-dimensional space point P _i The transformation relationship between the two is shown in formula (1):

s335, due to the fact that the transformation pose of the camera is unknown and noise exists, an error exists in the equation (1), therefore, the errors are summed to construct a least square problem, and then the error function is minimized to find the best initial transformation pose of the camera, as shown in the following equation (2):

in the formula (2), the first and second groups,

i.e. the initial transformation of the camera, error term, obtained by solving

Referred to as reprojection error;

s34, if the current frame is not the first, the second and the repositioning image frame, initializing the transformation pose based on the continuous motion model:

S342, solving the current frame F _c The first two frames F _c-1 And F _c-2 Position xi of transformation between _c-2,c-1 ；

S343, solving the current frame F _c Initial pose transformation of

Solving the following equation (3):

s52, sampling each projection straight line at equal intervals, wherein the number of the sampled points of each projection straight line satisfies the following formula (4):

in the formula (4), n is the total number of sampling points of each projection line, L is the corresponding length of each projection line, and L is _s Represents the length of the shortest projected line in the current frame, L _l The length of the longest projection straight line in the current frame;

s53, defining a neighborhood with the size of Lx 2w by taking each sampling point as the center, and calculating all pixel points p in the neighborhood _i (u _i ,v _i ) Corresponding gradient amplitude m _i (u _i ,v _i ) Direction of gradient beta _i (u _i ，v _i ) The image neighborhood schematic diagram generated by each projection straight line sampling point is shown in fig. 7, and the calculation formulas of the gradient amplitude and the gradient direction are as follows:

in the formula (5), A _i (u _i ,v _i ) Represents a point p _i (u _i ,v _i ) Change in luminosity in the x-axis direction, B _i (u _i ,v _i ) Represents a point p _i (u _i ,v _i ) Photometric change in the y-axis direction, I (u) _i Θ1,v _i Θ 1), Θ e { +, - } represents the point p _i (u _i Θ1,v _i Θ 1) corresponding pixel values;

s54, screening neighborhood points according to the gradient amplitude and the gradient direction of the pixel points in the neighborhood of each sampling point obtained in the step S53, and when the gradient amplitude of the neighborhood points is lower than a set threshold value, considering that the neighborhood points do not contribute to straight lines and abandoning the points, or else, keeping the points; the contribution of the screened points to the projection adaptation degree of the straight line is different, not all pixel points in the rectangular neighborhood have the same weight to carry out adaptability evaluation, and the pixel points closer to the projection line segment should make more contribution to an adaptability evaluation function (FEF);

s55, calculating the distance d from each neighborhood point screened in the step S54 to the subordinate projection straight line _i At a distance d _i The setting of a normal gaussian distribution satisfying the criteria μ ═ 0 and σ ═ w forms the weight value G for each neighborhood point _u,σ (d _i )；

S56, subtracting the direction alpha of the straight line from the gradient direction beta of all the pixel points in the neighborhood of each sampling point to obtain a direction difference value, if the direction difference value is larger than a set threshold value, further screening and rejecting the neighborhood points, and if the direction difference value is in the range of the set threshold value, multiplying the direction difference value by the gradient amplitude m of the pixel points in the neighborhood _i (u _i ,v _i ) Obtaining an adaptation contribution value of each neighborhood point, wherein a calculation formula of the adaptation contribution value is shown as a formula (6):

E _pi ＝|m _i (u _i ,v _i )·cos(β _i (u _i ,v _i )-α)| (6)

in the formula (6), the first and second groups,

representing the adaptive contribution value of each domain point to the straight line;

s57, multiplying the adaptive contribution values of all the sampling domain points of each projection straight line solved in the step S56 by the weight value G in the step S55 _u,σ (d _i ) And accumulating and summing to form the adaptation degree value of the straight line and the neighborhood image data, wherein the calculation formula is shown as a formula (7):

in the formula (7), the first and second groups,

is a projected straight line on the current frame,

the adaptive degree value corresponding to the projection straight line;

s58, accumulating the fit degree value of each projection straight line obtained in the step S57 to form an error function of a Visual Odometer (VO) module, wherein the obtained error function formula is shown as a formula (8):

in the formula (8), wherein E (ξ) _ck ) Calculating the error of the current image frame transformation pose by the vision odometer; xi _ck A lie algebra form for transforming the pose from the key frame to the current frame;

detecting straight lines on the key frames;

a projection function that is a function that projects a straight line from the key frame to the current frame;

s6, iterative optimization of the initial pose of the current frame camera and pose solution of the current frame camera: according to the error function finally formed in the step S5, carrying out iterative optimization of the initial pose transformation of the camera by using a nonlinear error function optimization method; obtaining the transformation pose of the current frame and the key frame by optimization

Left-handed key frame pose xi _k Solving the pose xi of the current frame camera _c Therefore, the specific position of the camera at the current frame moment is realized, and the positioning of the camera is realized:

s61, constructing an optimization formula of the error function, as shown in formula (9):

in the formula (9), the reaction mixture,

representing the camera transformation pose obtained by iterative optimization,

as a function of error E (ξ) for visual odometry _ck ) Minimizing, and solving the optimal transformation pose xi between the current frame and the key frame _ck ；

S62, carrying out iterative optimization solution by adopting a Levenberg-Marquardt (Levenberg-Marquardt L-M) nonlinear optimization method, and realizing the positioning of the camera by updating the parameters of the camera, wherein the parameters of the camera at least comprise the position and the posture of the camera in a world coordinate system;

in steps S5 and S6 of this embodiment, a novel and effective error function is constructed by using a gradient relationship between a straight line and image data in its neighborhood, and accurate positioning of a mobile device and three-dimensional map reconstruction of peripheral edge structure information can be realized by nonlinear optimization iterative optimization; particularly, the construction error function is evaluated by the adaptation degree value formed by the line features and the image data which commonly exist in the environment, so that the drift error caused by the matching error of the past point features is well relieved, and the uncertainty of the aperture and the line segment end point of the straight line is solved by constructing the reprojection error no longer by the end point of the straight line;

s7, judging key frames: judging whether the current frame is a key frame or not based on the straight line adaptation degree value of the step S5: if the current frame is judged to be the key frame, replacing and storing the current frame as a new key frame, inserting the new key frame into a rear-end sliding window, and performing local BA optimization; if the current frame is judged to be a non-key frame, the relevant operation of the key frame is not carried out, and the step S2 is returned to:

regarding the current frame, the current frame is determined as a new key frame when any one of the following conditions is satisfied:

1) the maximum frame number limit for key frame replacement is reached or 25 frames have been processed from the last global relocation;

2) for the inter-frame transformation pose obtained in the step S6

Decomposing the rotation and translation components, solving the distance between the current frame and the key frame by using the components obtained by decomposition, and if the distance is greater than a set threshold value, determining the current frame as a new key frame;

3) calculating the image frame adaptation rate theta according to the error function of the current frame obtained in step S5 by using the following formula (10), and if the adaptation rate theta obtained for the current frame is higher than a preset threshold (e.g. 0.85), the current frame is considered as a new key frame, and the formula (10) is as follows:

in the formula (10), E _k+j,k Is a key frame F _k To the current frame F _k+j Obtaining a Visual Odometer (VO) module error value; e _k+1,k Is a key frame F _k To its next frame F _k+1 Obtaining a Visual Odometer (VO) module error value;

as a key frame F _k+j A detection straight line of (1);

as a key frame F _k+1 A detection straight line of (1);

s8, local BA optimization: when the current frame is judged to be the key frame in the step S7, inserting the current frame into the rear-end local BA optimization sliding window, and constructing the common-view local BA optimization based on the sliding window by using the point and line features extracted in the step S2 and the camera transformation pose solved in the step S6, all the key frames in the sliding window and the point and line features belonging thereto:

s83, re-projection errors of the point and line characteristics in the co-view are solved respectively, an error function of rear-end local BA optimization is constructed by combining the two errors, and then iterative optimization is carried out by adopting a Levenberg-Marquardt (L-M) method, so that the optimization of the pose of the key frame in the co-view and the co-view landmark is realized:

s831, solving for the reprojection error of the point characteristics according to the formula (11):

in formula (11), P _W,j ∈R ³ Corresponding a three-dimensional coordinate for the jth characteristic point in the common view, wherein pi is a projection function and pi (xi) _i ,P _W,j ) Representing a projection P _W,j Image planes to the ith key frame, e _ij Then represents the projection point coordinates and the observation coordinates p _i,j The difference in distance between;

s832, solving the reprojection error of the line characteristics according to the formula (12):

in the formula (12), wherein P _W,k ，Q _W,k Three-dimensional coordinates corresponding to two end points of the line feature respectively,

line coefficients for the kth line feature observed on the ith keyframe;

wherein the content of the first and second substances,

is shown in equation (13):

in the formula (13), the first and second groups,

and

representing two-dimensional homogeneous image coordinates corresponding to left and right end points of the kth line feature on the ith key frame;

s833, combination e _ij And e _ik An error function for the back-end local BA optimization of the invention is constructed, and is specifically shown in formula (14):

in equation (14), Ψ is a vector formed by variables optimized for the back-end local BA, and ρ is a robust huber loss function, Ω _ij And Ω _ik Is a covariance matrix corresponding to the point feature and the line feature,

and

common view set as key frameCollecting lines;

s9, loop detection: performing loop detection and global graph optimization on the new key frame obtained in step S8 and all existing key frames, specifically: performing loop detection on each newly inserted key frame and all previous key frames based on a dictionary model, and if the loop frame is detected, performing graph optimization on all key frames between the new key frame and the loop frame by using an error function constructed by the visual odometry module in the step S5; accumulated drift errors can be further eliminated through loop detection and global graph optimization, and poses and landmark points are further optimized:

s91, based on the dictionary model, firstly, finding out the corresponding word type in the dictionary K-ary tree according to the descriptors corresponding to all points and line features of the current key frame;

counting the times of each word appearing in the picture, and calculating the weight corresponding to each word according to the formula (15):

in the formula (15), tf (I, I) _t ) The corresponding weight of each word is represented,

indicating the number of times the word i appears in the current picture,

and (3) representing the number of all words in the picture, and calculating the word vector of the current key frame according to the calculated weight and the formula (16):

in the formula (16), the first and second groups,

the word vector representing the current key frame, tf (I, I) _t ) To representThe weight of each word in the current keyframe, idf (i) represents the frequency of occurrence of each word in the lexicon;

s94, constructing a pose graph of all key frames in a loop according to the transformation pose obtained by solving in the step S93 and a graph optimization error function shown in a formula (17), wherein nodes of the pose graph are poses of each key frame, and edges of the pose graph are co-view landmark points between the key frames, so that global pose graph optimization is realized:

γ _i,j (ξ _i ,ξ _j )＝log(exp(ξ _i,j )·exp(ξ _j ^-1 )·exp(ξ _j )) (17)

xi in formula (17) _i,j Is the transformation pose, xi, between key frame i and key frame j _i And xi _j Respectively obtaining camera poses of the key frame i and the key frame j after local BA optimization;

s12, outputting the result: pose storage, path drawing and map storage and synchronous real-time display, wherein the path diagram obtained in the embodiment is shown in fig. 8, and the corresponding three-dimensional reconstruction map is shown in fig. 9.

Finally, it should be pointed out here that: the above is only a part of the preferred embodiments of the present invention and should not be construed as limiting the scope of the present invention, and the insubstantial modifications and adaptations of the present invention by those skilled in the art based on the above description are intended to be covered by the present invention.

Claims

1. An RGB-D visual SLAM method based on dotted line characteristics is characterized by comprising the following steps:

s2, extracting point and line characteristics: performing edge detection and straight line fitting on an image frame acquired by an RGB-D camera at the current moment, further screening and sampling line points on the straight line obtained by fitting, and forming a point set by the acquired points;

s5, constructing an adaptive error based on the local gradient of the line segment to form an error function of the visual odometer module: obtaining projection straight lines on the current frame according to the step S4, obtaining the adaptation degree value of each straight line on the current frame by utilizing the relation between the projection straight lines and the neighborhood image data of the projection straight lines, and accumulating the adaptation degree value of each projection straight line to form an error function of the visual odometer module;

s6, iterative optimization of the initial pose of the current frame camera and pose solution of the current frame camera: according to the error function finally formed in the step S5, iterative optimization of the initial pose transformation of the camera is carried out by utilizing a nonlinear error function optimization method, and the current frame and the key obtained by optimizationPose transformation of frames

s7, judging key frames: judging whether the current frame is a key frame or not based on the straight line adaptation degree value of the current frame obtained in the step S5, if so, replacing and storing the current frame as a new key frame, inserting the new key frame into a rear-end sliding window, and performing local BA optimization; if the current frame is judged to be a non-key frame, the relevant operation of the key frame is not carried out, and the step S2 is returned;

s10, repositioning: when the difference between the camera transformation pose solved by the error function of the visual odometer module in the step S6 and the set threshold value is too large, the pose solving of the current frame camera is judged to be failed, and repositioning is carried out; in the repositioning process, loop detection processing in step S9 is firstly carried out, loop frames corresponding to the current frame are searched, then the step S3 is skipped to carry out estimation of initial transformation poses, and corresponding subsequent operations are continuously and sequentially executed;

2. The RGB-D visual SLAM method based on dotted line features of claim 1, wherein: in step S1, an interior reference K calibration method is used to calibrate the RGB-D camera.

3. The RGB-D visual SLAM method based on dotted line features of claim 1, wherein: in the step S2, edge detection and straight line fitting are performed on the video image frame acquired by the RGB-D camera, in which preprocessing operations including denoising and edge enhancement are performed on the video image frame acquired by the RGB-D camera; then, edge extraction is carried out on the preprocessed image by using an edge detector; and finally, fitting the extracted edges to obtain a required straight line, screening the fitted straight line and sampling line points, wherein the collected points form a point set.

4. The RGB-D visual SLAM method according to claim 1, wherein the step S3 of obtaining the initial pose transformation of the camera includes the following steps:

s31, firstly, judging that the image frame acquired by the current RGB-D camera belongs to the frame, then respectively executing the camera pose initialization scheme in the step S32, the step S33 or the step S34 for different image frames according to the judgment result, and acquiring the initial transformation pose of the corresponding image frame;

s32, if the current frame image is the first frame image, the pose is changed and initialized by using a unit array I;

s33, if the current frame is the second frame image or the repositioning image frame, the initial transformation pose of the camera is solved by a perspective N-point projection method in geometric transformation by using the point set acquired in the step S2;

5. The RGB-D visual SLAM method according to claim 4, wherein the step S33 includes the following operations:

s333, screening out error point pairs by using RANSAC;

and S334, solving the initial transformation pose of the camera by using the matching point pairs obtained in the S333 and through a perspective N point projection method in geometric transformation.

6. The RGB-D visual SLAM method according to claim 5, wherein the step S334 includes the following operations:

R∈SO(3),t∈R ³ The lie algebra form of T is xi, wherein the pixel point p of the current frame is _i And three-dimensional space point P _i The transformation relationship between the two is shown in formula (1):

in the formula (1), s _i Is the scale factor, ξ, of the current image frame ^{^} An antisymmetric matrix of ξ;

in the formula (2), the first and second groups,

i.e. the initial transformation of the camera, error term, obtained by solving

Referred to as reprojection error.

7. The RGB-D visual SLAM method according to claim 4, wherein the initialization of the camera transformation matrix in step S34 is performed by using a continuous motion model, specifically: the current frame F _c Previous frame F of _c-1 Key frame F corresponding thereto _k Change the pose between

Multiplication by the current frame F _c The first two frames F _c-1 And F _c-2 Position xi of transformation between _c-2,c-1 Obtain the current frame F _c Initial pose transformation of

Namely, as shown in equation (3):

8. the RGB-D visual SLAM method according to claim 1, wherein the step S5 of constructing the adaptive error based on the local gradient of the line segment to form the error function of the visual odometer module includes the following steps:

and S58, accumulating the fit degree values of the projection straight lines obtained in the step S57 to form an error function of the visual odometer module.

9. The RGB-D visual SLAM method according to the dotted line feature of claim 1, wherein the local BA optimization in step S8 includes the following steps:

s81, when the current frame is judged to be the latest key frame in the step S7, the point and line characteristics extracted in the step S2 are used, the camera transformation pose obtained in the step S6 is used, the common-view local BA optimization based on the sliding window is established by all key frames in the sliding window and the point and line characteristics to which the key frames belong, the pose nodes in the common view are the corresponding poses of all key frames contained in the current sliding window, and all the point and line landmarks are the common-view landmarks of all the key frames in the sliding window; calculating a common visual landmark between two key frames, and adopting landmark point matching based on the Oriented Fast and Rotated Brief characteristics and Line characteristic matching based on a Line Band Descriptor;

s82, because the size of the sliding window is fixed, if the sliding window is full before the current new key frame is inserted, marginalizing the old key frame and the landmark point;

and S83, solving the reprojection errors of the point and line characteristics in the common view respectively, combining the two errors to construct an error function of the rear-end local BA optimization, and then performing iterative optimization by adopting a Levenberg-Marquardt method, thereby realizing the optimization of the pose of the key frame in the common view and the common-view landmark.

10. The RGB-D visual SLAM method according to claim 1, wherein in step S9, the loop detection and global map optimization for all the key frames obtained in step S8 includes the following steps: