CN106909877B

CN106909877B - Visual simultaneous mapping and positioning method based on dotted line comprehensive characteristics

Info

Publication number: CN106909877B
Application number: CN201611142482.XA
Authority: CN
Inventors: 刘勇; 左星星
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2016-12-13
Filing date: 2016-12-13
Publication date: 2020-04-14
Anticipated expiration: 2036-12-13
Also published as: CN106909877A

Abstract

The invention discloses a visual simultaneous mapping and positioning method based on point-line comprehensive characteristics, which comprehensively utilizes line characteristics and point characteristics extracted from binocular camera images and can be used for robot positioning and attitude estimation in indoor and outdoor environments. For parameterization of straight line features, the Prockian coordinates are used for calculation of straight lines, including geometric transformation, three-dimensional reconstruction and the like, and in optimization of the back end, the orthogonal representation of the straight lines is used for minimizing the number of parameters of the straight lines. And establishing a visual dictionary integrating the point-line characteristics in an off-line manner for closed-loop detection, and enabling the point-line characteristics to be treated differently in the visual dictionary and in establishing an image database and calculating the image similarity by a method of increasing a zone bit. The method can be used for constructing indoor and outdoor scene maps, and the constructed maps integrate the characteristic points and the characteristic straight lines and can provide richer information.

Description

Visual simultaneous mapping and positioning method based on dotted line comprehensive characteristics

Technical Field

The invention relates to the technical field of vision simultaneous mapping and positioning, in particular to the technical field of binocular vision SLAM (simultaneous positioning and mapping) based on characteristics.

Background

Aiming at the vision simultaneous modeling and positioning technology, the optimization and graph optimization based on key frames become a mainstream framework of the vision SLAM problem. Graph optimization techniques have been demonstrated to have better performance than conventional filtering frameworks in terms of computational consumed resources and consistency of results. Point features are the most widely used features in visual simultaneous mapping and localization techniques, are particularly abundant in both indoor and outdoor environments, are easily tracked in continuous image sequences, and are easily computed in geometric transformations. However, point features are more environmentally dependent, and high quality point features require robust but time-consuming feature detection and description. The line features are higher in representation level than the point features in the image, more robust information is provided in a structured environment, and the environment map and the positioning can be more efficiently and accurately established by using fewer line features and point features.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a visual SLAM method based on dotted line comprehensive characteristics, which can be used for robot positioning and attitude estimation in indoor and outdoor environments, and the system is more robust and more accurate due to the comprehensive use of the dotted line characteristics. The method can be used for constructing indoor and outdoor scene maps, the constructed maps integrate the characteristic points and the characteristic straight lines, and richer scene information can be provided. Therefore, the invention provides the following technical scheme:

a visual simultaneous mapping and positioning method based on dotted line comprehensive features is characterized by comprising the following two parts of establishing a visual dictionary offline and establishing a sparse visual feature map online:

firstly, a clustering method is utilized to establish a tree-shaped visual dictionary in an off-line way, namely a KD tree of a descriptor space, and the inverse text frequency of each node in the tree-shaped visual dictionary is determined, wherein each node is a clustering center of a descriptor:

converting the characteristics contained in each frame of image into visual words, namely characteristic descriptors; performing hierarchical clustering on the visual vocabulary, and establishing a KD tree of a description subspace, wherein the KD tree is called a visual dictionary; establishing a tree-shaped visual dictionary by using a feature descriptor offline, and extracting a training image of the feature descriptor in a centralized manner; the Descriptor is ORB (ordered FAST and indexed BRIEF), Oriented FAST corner detection and Binary Robust Independent basic feature Descriptor point feature Descriptor and LBD (Line Band Descriptor) straight Line feature Descriptor, wherein the ORB point feature Descriptor and the LBD (Line Band Descriptor) straight Line feature Descriptor are Binary descriptors, and the two Binary descriptors are respectively expanded, namely, a marker bit 0 is added to ORB point feature, a marker bit 1 is added to LBD Line feature, and the marker bit 0 and the marker bit 1 can distinguish straight Line feature from point feature;

the weight of each node in the visual dictionary is determined by the inverse text frequency (namely IDF, the main idea is that if the number of pictures containing the visual vocabulary t is less, the inverse text frequency is greater, and the vocabulary t has good category distinguishing capability);

then, establishing a sparse visual feature map on line, comprising the following steps:

acquiring a corrected image from a binocular camera, and extracting and describing features of the corrected image:

extracting point line characteristics and descriptors thereof in the corrected image, and extracting an ORB point characteristic descriptor and an LBD line characteristic descriptor on line;

secondly, performing feature matching and three-dimensional reconstruction on the corrected image in the binocular camera:

matching the feature points and the feature straight lines in the corrected image, establishing a matching pair, utilizing a binocular vision imaging model to carry out three-dimensional reconstruction on the feature points and the feature straight lines, expressing the feature straight lines by using Prock coordinates in reconstruction, maintaining the end points of the straight lines, establishing a sparse feature map of comprehensive point-line features by using the feature points and the feature straight lines, and adopting the Prock coordinates for expressing and calculating the straight lines

Step three, image matching of front and back frames, local map matching and motion estimation of a camera:

after reconstructing characteristic points and characteristic straight lines in a three-dimensional space, tracking and matching the points and the straight lines, wherein the matching comprises two parts: front and back image matching for estimating the pose of the camera at the current time, and local map matching,

the pose is solved by assuming the current time left camera coordinate system O_cIn the world coordinate system O_wMiddle rotation and translation are R_wcAnd t_wcReconstructed feature point j in world coordinate system O_wHas the coordinate Pj_wThen the feature point is in the left camera coordinate system O at the current time_cCoordinates Pj of lower_cComprises the following steps:

Pj_c＝R_cwPj_w+t_cw

reconstructed characteristic straight line i in world coordinate system O_wHas the coordinates of Li_w，Li_w＝[n^T,v^T]^TThen the characteristic straight line is in the left camera coordinate system O at the current time_cThe following coordinates are:

wherein R is_cw＝R_wc ^T，t_cw＝-R_wct_wcRotation and translation, respectively, of the world coordinate system in the left camera coordinate system, [ t [ [ t ]_cw]_×Is formed by a vector t_cwA constructed 3 x 3 antisymmetric matrix; the characteristic point Pj_cProjecting the image into the current left camera through the pinhole camera model to obtain the projected image coordinates

Subjecting a characteristic straight line Li_cProjecting the image to the current left camera to obtain a projection line equation of l_i(ii) a Defining errors of point and line characteristics respectively, the error of point being reprojection error, i.e. projection coordinate of characteristic point

And the observation coordinate p_jA distance e between_pj(ii) a Error of line is two end points ep1 of observation line segment_i、ep2_iGeometric distance e to projection line equation_li(ii) a The goal of motion estimation is to solve the following nonlinear least squares problem:

α and β are weight values of point feature reprojection errors and line feature reprojection errors, α and β are solutions of motion estimation obtained by a Random Sample Consensus (stochastic sampling consistency) method in an optimization process in order to eliminate the influence of wrong image feature matching by two constants;

step four, using the visual dictionary obtained in the step one to perform loop detection:

establishing an image database which comprises point and line feature descriptors extracted from visual key frames, converting the features of the images into word packet vectors according to an established visual dictionary, wherein the word packet vectors comprise TF-IDF (TF represents the frequency of the occurrence of a vocabulary entry in a frame of image, IDF is the inverse text frequency described above, and TF-IDF represents the product of TF and IDF) scores of each visual vocabulary in the images, if the frequency of the occurrence of a visual vocabulary in the same frame of image is higher, the TF-IDF scores are higher, but the TF-IDF scores are lower if the frequency of the occurrence of the visual vocabulary in the whole image database is higher;

when the similarity of two images is evaluated, the images are converted into word packet vectors according to the extracted features, then the similarity score is calculated according to the word packet vectors, the images acquired by the current camera are compared with the images in an image database, the images acquired at the same position with higher score are a closed loop, the position is shown to be visited before, the geometric consistency is utilized, namely, enough matching pairs in the two images support Euclidean transformation, the time consistency is realized, namely, a plurality of image sequences before and after the two images are similar, and whether the closed loop is formed is further judged;

and step five, putting the point-line characteristics, the camera motion estimation, the closed-loop detection and the like obtained in the step two to the step four into a key frame-based graph optimization frame, and minimizing the number of parameters of straight lines by adopting orthogonal representation of the straight lines in the optimization of the rear end. And optimizing the pose of the camera and the pose of the feature points and lines in the image optimization framework, and realizing the positioning of the camera and the construction of an online sparse visual feature map.

On the basis of the technical scheme, the invention can also adopt the following further technical scheme:

when the point-line characteristics in the image are extracted, FAST corner point detection is adopted for the detection of the characteristic points, an ORB descriptor is used for description, and an LSD algorithm is adopted for the detection of the line characteristics to extract the line characteristics, and an LBD descriptor is used for representing the line.

For the representation of the straight line characteristics, the Prockian coordinates are used for calculation of the straight line, the geometrical transformation and the three-dimensional reconstruction are included, and the orthogonal representation of the straight line is used in the optimization of the back end to minimize the parameter number of the straight line.

Establishing an offline visual dictionary integrating point-line characteristics by using a clustering method kmeans + + (K mean + + clustering method), wherein the offline visual dictionary is used for identifying and querying similar images in an online process to perform loop detection, enabling the point-line characteristics to be treated differently in the visual dictionary and when an image database is established by adding a marker bit in the process of establishing the dictionary, and converting the images into word packet vectors according to the extracted characteristics when the similarity of the two images is evaluated, wherein the word packet vectors comprise TF-IDF scores of each visual word in the images, and if the frequency of occurrence of a word in the same frame of image is higher, the score is higher, but the score is lower if the frequency of occurrence of a word in the whole data set is higher;

some characteristic parts v in word packet vector_i ^pSum line characteristic part v_i ^l(ii) a Two word bag vectors v₁，v₂The similarity of (a) is defined as:

wherein a and b are weight values of the point feature score and the line feature score, are two constants, and satisfy a + b ═ 1.

Due to the adoption of the technical scheme, the invention has the beneficial effects that: the visual dictionary provided by the invention is trained by using various large data sets to achieve a good clustering effect, and the visual dictionary can be repeatedly used after being built; the method can estimate the pose of the camera at the current moment by using fewer features as much as possible, and the local map matching involves more features, so that a more accurate solution can be obtained.

Drawings

FIG. 1 is a visual dictionary model of the comprehensive point-line characteristics established based on a clustering method according to the present invention;

FIG. 2 is a schematic representation of the Prock coordinates of a characteristic line of the present invention;

FIG. 3 is a diagram illustrating the selection of the end points of the wireless long straight line in the space of the present invention;

FIG. 4 is a reprojection error model of a characteristic line of the present invention;

FIG. 5 is a diagram model established by the present invention using front-end derived point-line features, camera motion estimation, closed-loop detection, etc.

Detailed Description

For a better understanding of the technical solution of the present invention, it is further described below with reference to the accompanying drawings.

Establishing a visual dictionary by using a clustering method in an off-line manner, and determining the inverse text frequency (IDF) of a node:

in order to judge whether the camera repeatedly visits the same area, the characteristics contained in each frame of image are converted into visual words. These visual words correspond to a discretized descriptor space, called the visual dictionary. As shown in fig. 1, a large number of feature descriptors are used to establish a tree dictionary offline, the feature descriptors are extracted from a large number of training image sets, and the process of establishing the tree dictionary is also a process of continuously clustering by using a Kmeans + + algorithm. The descriptors here are ORB point descriptors and LBD straight line descriptors. Because they are 256-bit binary descriptors, they can be put in the same visual dictionary, which can simplify the process of creating visual dictionary and the operation of loop detection. In general, there are many point features and few line features in the image, so the point line features are treated differently in the visual dictionary. Two 256-bit binary descriptors are expanded respectively: and adding a flag bit 0 to the ORB point characteristics and adding a flag bit 1 to the LSD line. Therefore, the mark bits can be used for distinguishing the linear features from the point features, and the point features and the line features are also distinguished when an image database is established on line, the image similarity is compared and the like. Fig. 1 is a visual dictionary model based on a clustering method and integrating point-line features. The visual dictionary should be trained with a large number of diverse data sets to achieve a good clustering effect, and the visual dictionary can be reused after being built. The weight of each node in the visual dictionary is determined by the inverse text frequency (IDF) of all the feature descriptors contained in that node.

IDF＝log(N/n_i)

Where N is the number of all images in the dataset, N_iThe number of images that contain the feature represented by the node.

Visual SLAM of online synthetic dotted line features main steps:

step one, acquiring a corrected image from a binocular camera, and extracting and describing features of the image

And extracting the point and line characteristics and descriptors thereof in the binocular camera image. The detection of the characteristic points adopts FAST corner detection and is described by ORB descriptors. They are very fast to calculate and match, while having rotational invariance to viewing angle. The detection of the straight line features adopts an LSD (line segment detection) algorithm to extract the straight line features and uses an LBD (line band descriptor) descriptor to represent the straight line. The ORB descriptor and the LBD descriptor are 256-bit binary descriptors, and the storage structures are the same, so that convenience is provided for establishing an offline dictionary integrating dotted line characteristics, querying an image database and the like. This step is the same as extracting features and descriptors in the process of establishing a visual dictionary offline.

Step two, matching left and right image features and three-dimensional reconstruction

When the left image and the right image are matched, the feature points in the right image and the middle points of the feature straight lines are projected to the left image. Because the image is corrected, only a feature with the minimum Hamming distance from the feature of the right image needs to be searched in a rectangular window in the left image, and the feature is the feature matched with the feature of the right image. And sorting the Hamming distances, adaptively selecting a threshold value, and rejecting some matching pairs with larger distances to ensure the matching accuracy.

Three-dimensional reconstruction of feature points:

for the corrected image, it is assumed that the points of the matching points in the left and right images are m ═ u, respectively₁v]^TAnd m ═ u₂v]^TThe coordinate of the three-dimensional point M determined by M and M' in the left camera coordinate system is [ X Y Z]^TThen, there are:

wherein B, f, u_cAnd v_cThe parameters of the binocular stereo vision system after image correction are shown, B is the base line distance of the binocular camera, f is the focal length of the camera, u_cv_c]^TIs the pixel coordinate of the intersection of the optical axis and the image plane, d ═ u₁-u₂Is the disparity of the matching points, which reflects the depth of the three-dimensional point.

Three-dimensional reconstruction of characteristic straight lines:

it is obviously not suitable to represent a straight line with two three-dimensional end points, because the change of the viewing angle and some obstacles make it very difficult to extract and track the end points of the straight line from the image. Therefore, it is most appropriate to represent a three-dimensional line in space as a line of infinite length. As shown in fig. 2, the prock coordinates are used for calculation of straight lines, including geometric transformation, three-dimensional reconstruction, etc., and the orthogonal representation of the straight lines is used for optimization of the back end.

In the case of three-dimensional reconstruction of a straight line, for efficient geometric transformation and calculation, the prock coordinates L ═ n are used^T,v^T]^TTo represent a straight line, as shown in fig. 2, where n is the normal vector of the plane pi formed by the straight line and the camera origin Oc, and v is the direction vector of the straight line L. The prock coordinate has a constraint n perpendicular to v, i.e., nxv ═ 0. The projection of the line L in space in the image plane is a line L, and the corresponding line end point A, B is projected as points a, b. In the camera coordinate system O_cIn (1),c＝KC,d＝KD,n＝C×D,l＝c×dwhereinc,d,lIs a homogeneous coordinate representation, and x is the outer product, i.e., cross product. K is a parameter matrix in the camera,

can derive a straight line l in the image plane, satisfying l ═ det (K) K^-Tn is the same as the formula (I). Suppose left camera centerThe plane formed by the space straight line L is pi_lThe plane formed by the center of the right camera and the space straight line L is pi_rThe intersection line of the two planes is a spatial straight line. Plane pi_lExpressed as:

π _l＝P_l ^T l _l∈R⁴

whereinl _lIs the image of the spatial straight line in the left camera image plane. P_lIs a projection matrix with the left camera,

P_l＝K_l[I|0]

K_li is an intrinsic parameter matrix of the left camera, I is a 3 × 3 identity matrix, and 0 is a 3 × 1 zero matrix. Similarly, the plane formed by the center of the right camera and the space straight line L can be obtained by the external parameters of the camera, and the plane is pi_rIs represented by homogeneous coordinatesπ _r. The intersection line of the two planes is a space straight line L, and the dual Prock matrix of the L is expressed as

The relationship between the dual prock matrix and the prock coordinate representation is:

the prock coordinates can be obtained by using the above formula.

The above is the three-dimensional reconstruction of the characteristic straight line, and in addition, since the scene map is to be built, the spatial straight line L is infinitely long, and for convenience of display, the spatial straight line needs to be cut, that is, two end points C, D of the maintenance straight line need to be maintained. The selection of the end point C, D on the spatial straight line L can be determined according to a certain rule by the imaging L of the spatial straight line L in the left camera image plane_lThe end points of (a) are obtained by geometric transformation, and as shown in fig. 3, the end points of the straight line are selected as a schematic diagram. In the figure, e is a straight line l perpendicular to l in the left camera image plane_cThe distance between points e-c can be set to any value. The plane pi' is determined by the straight line ec and the camera center OcOf the plane of (a). The end point C can be obtained by truncating the straight line L in space by the plane pi'. Similarly, an endpoint D may be obtained. In the process of camera movement, the end points c and d of the imaging L of the same spatial straight line L on the left camera image plane are not fixed, so that the cut C, D is also different, and only C, D points with the largest distance are selected as the end points of the straight line maintained in the space.

Step three, matching the characteristics of the front and the rear images and estimating the motion of the camera

After the left and right images are subjected to feature matching and three-dimensional reconstruction, a three-dimensional coordinate Pj of a feature point j and a feature straight line i in a world coordinate system is obtained_wAnd the Prock coordinates Li of characteristic straight lines_wThe projection of the characteristic point j in the left image at the current moment can be obtained after the front image and the rear image are matched

Projection l of a characteristic line in the left image at the current time_i. Suppose the current time left camera coordinate system O_cIn the world coordinate system O_wMiddle rotation and translation are R_wcAnd t_wcThen the feature point is in the left camera coordinate system O at the current time_cThe coordinates of lower are Pj_c＝R_cwPj_w+t_cw. Left camera coordinate system O of characteristic straight line i at current moment_cThe coordinates of where R is_cw＝R_wc ^T，t_cw＝-R_wct_wcRotation and translation, respectively, of the world coordinate system in the left camera coordinate system. [ t ] of_cw]_×Is formed by a vector t_cwA 3 x 3 antisymmetric matrix is formed. The characteristic point Pj_cProjecting the image into the current left camera through the pinhole camera model to obtain the projected image coordinates

Subjecting a characteristic straight line Li_cProjecting the image to the current left camera image to obtain a projection linear equation of l_i. Defining errors of the point and line features, respectively, the error of the point beingReprojection errors, i.e. feature point projection coordinates

And the observation coordinate p_jA distance e between_pj：

Error of line is two end points ep1 of observation line segment_i、ep2_iGeometric distance e to projection line equation_li：

Wherein ep1_i＝[ep1_i1ep1_i21]^TIs endpoint ep1_iIs equal to the homogeneous coordinate representation of ep2_iIs endpoint ep2_iHomogeneous coordinate representation of l_c＝[l_c1l_c2l_c3]^TIs the equation of a straight line l_cThe coefficients of (a) constitute a vector.

The goal of motion estimation is to solve the following nonlinear least squares problem:

α and β are weighted values of point feature reprojection errors and line feature reprojection errors, are two constants which can be set according to experience, and in order to eliminate the influence of wrong image feature matching, a Randac method can be adopted to obtain a better solution of motion estimation in the step.

Step four, using the off-line trained vision dictionary to make the loop detection

And establishing an image database by using the point-line feature descriptors extracted from the visual key frames. Calculating distances between all descriptors extracted from the key frame and the clustering center as the node in the visual dictionary, and selecting one layer in the dictionary treeAs a comparison level (typically 4-6 levels are chosen), all extracted descriptors are subdivided into the nodes of the dictionary tree that are closest to it in that level. According to the division condition, the image can be dispersed into a word packet vector, the dimension of the word packet vector is the number of the comparison layer nodes, the word packet vector comprises TF-IDF scores of each visual word in the image, and a point characteristic part v in the word packet vector_i ^pSum line characteristic part v_i ^l. The score is higher if a word appears more frequently in the same frame of image, but the score is lower if it appears more frequently in the entire data set. TF-IDF is:

TF-IDF＝IDF*(n_iIt/n_It)

n_iItto be in an image I_tThe number of the visual words in (1), n_ItAs an image I_tThe IDF is the inverse text frequency of the visual vocabulary in the established off-line visual dictionary.

The newly generated word packet vector is then compared with the word packet vectors in the image database to perform similarity judgment. Two word bag vectors v₁，v₂The similarity of (a) is defined as:

where a and b are weight values of the point feature score and the line feature score, are two constants, and satisfy a + b ═ 1, which can be set empirically. The loop detection is performed according to the similarity, so that false detection can occur, and other information needs to be assisted. Images in the database that are close in time generally get similar scores. By utilizing the characteristic, images close in time sequence are grouped, and scores are compared by taking a group as a unit, wherein the score of the image group is the sum of the scores of each frame of image in the group. The score of each frame of image must be above a certain threshold to be added to the score of the group of images. Upon searching the entire image database, the group with the highest grouping score is selected, and the image with the highest single frame image score is considered to be the pending closed-loop image. And finally, obtaining the closed-loop image pair by using strategies such as geometric verification (comparing all characteristic points in the image), time consistency (similarity exists between images in front and back time periods of the closed-loop image pair) and the like.

Step five, putting the point-line characteristics, camera motion estimation, closed-loop detection and the like obtained at the front end into a key frame-based image optimization frame

And modeling an objective function to be optimized, namely an error model of the point characteristic and the line characteristic and a closed-loop detection error model. This is a nonlinear Optimization problem, and Graph models can be built and then iteratively optimized using sparsity calls for the open source tools g2o (General Graph Optimization), gtsam (georgia Tech Smoothing and mapping), Ceressolver, etc. Graph Optimization tools to solve the problem. And finally obtaining the optimized camera position and posture and points and straight lines in the space.

Error model of point features:

suppose the current time i left camera coordinate system O_cIn the world coordinate system O_wMiddle rotation and translation are R_wcAnd t_wcIs provided with

Reconstructed feature point j in world coordinate system O_wHas the coordinate of Pw_jThen the feature point is in the left camera coordinate system O at the current time_cThe following coordinates are:

P_ij＝R_cwPw_j+t_cw

P_ij＝[x_ijy_ijz_ij]^T

p_ijprojection by camera projection model into left camera image with image coordinates of

Where π is the projection equation:

wherein f is_x，f_yIs the focal length in the longitudinal and transverse directions of the camera (u)_c,v_c) The imaging origin of the camera is the camera internal reference.

The reprojection error of the point is defined as the projection coordinate of the characteristic point

And the observation coordinate p_ijA distance e between_ij：

Error model of line features:

reconstructed characteristic straight line k in world coordinate system O_wHas the coordinates of Lw_k，Lw_k＝[n^T,v^T]^TThen the characteristic straight line is in the left camera coordinate system O at the current time_cThe following coordinates are:

as shown in fig. 4, a characteristic straight line L is drawn_ikProjected to the current left camera image to obtain a projection linear equation of

The projection line of the line L on the left camera image plane is

And the observation line segment is l_ik. Let observation line segment l_ikTo the projection line of

Distance d of_l1,d_l2Set as the error function:

wherein a ═ a₁a₂1]^TIs the homogeneous coordinate of the endpoint a, b ═ b₁b₂1]^TIs the homogeneous coordinate of the end point b,

is a linear equation

The coefficients of (a) constitute a vector.

In the back-end optimization process, in order to minimize the number of parameters of the straight line and prevent over-parameterization, the straight line is parameterized by adopting an orthogonal representation method (U, W) epsilon SO3 multiplied by SO2, wherein SO3 is a three-dimensional orthogonal rotation matrix, SO2 is a two-dimensional orthogonal rotation matrix, and the degrees of freedom are respectively 3 and 1.

Order to

Here, a minimum of four parameters are used

Where theta is a 3 x I vector,

is a scalar. Can pass through U^*←R(θ)U,

To update U, W ∈ SO3 × SO 2.

Closed-loop constrained error model:

suppose the position and attitude x of the camera at a certain moment_iDetecting that the position i is the same as the position i' which has already been walked by using a closed loop detection method, namely finding a pair of closed loops x_iAnd x_i′Generating a closed loop constraintC_l. The error of the closed-loop constraint is e_c＝x_i-g(x_i′,C_l). And the function g is a function for calculating the position and the attitude of the other moment in the closed-loop matching pair according to the position and the attitude of one moment in the closed-loop matching pair and closed-loop constraint.

Taking the feature points and the feature straight lines as road signs l, and the position and the attitude x of the camera and the road signs l as nodes in the graph model as loop detection C_lAnd the observation Z of the binocular camera as an edge, a graph model is established, as shown in fig. 5. The problem to be solved by graph optimization is to constantly optimize the variables l, X when u, Z, c are known, so that the known u, Z, c are taken as the observation Z and the variables l, X are taken as the state X. The problem to be solved by the graph optimization model is to maximize joint probability and obtain l^*、x^*

Since it is assumed that the observation Z is in state X_i,X_jThe observed error between is e₀(X_i,X_j) I.e. the four errors mentioned above. Assuming all errors obey a covariance of

A Gaussian distribution of

Taking the negative logarithm of the above equation, the objective function of the graph optimization model will become:

the problem is a nonlinear Optimization problem, and can be solved in a graph Optimization framework by methods such as Gauss Newton, LM (Levenberg-Marquardt Optimization), Dogleg (method proposed by Powell) and the like.

Claims

1. A visual simultaneous mapping and positioning method based on dotted line comprehensive features is characterized by comprising the following two parts of establishing a visual dictionary offline and establishing a sparse visual feature map online:

converting the characteristics contained in each frame of image into visual words, namely characteristic descriptors; performing hierarchical clustering on the visual vocabulary, and establishing a KD tree of a description subspace, wherein the KD tree is called a visual dictionary; establishing a tree-shaped visual dictionary by using a feature descriptor offline, and extracting a training image of the feature descriptor in a centralized manner; the descriptor is an ORB, an oriented fast corner detection and binary robust independent basic feature descriptor point feature descriptor and an LBD straight line feature descriptor; the ORB point feature descriptor and the LBD straight line feature descriptor are both binary descriptors, and the two binary descriptors are respectively expanded: adding a zone bit 0 to ORB point characteristics and adding a zone bit 1 to LBD line characteristics, wherein the zone bit 0 and the zone bit 1 can distinguish straight line characteristics from point characteristics; before obtaining the LBD straight line feature descriptor, firstly, an LSD is used for detecting a straight line, and then the LBD descriptor is used for describing the straight line;

the weight of each node in the visual dictionary is determined by the inverse text frequency of all feature descriptors contained in the node;

matching the feature points and the feature straight lines in the corrected image, establishing a matching pair, performing three-dimensional reconstruction on the feature points and the feature straight lines by using a binocular vision imaging model, expressing the feature straight lines by using Prock coordinates in reconstruction, maintaining the end points of the straight lines, establishing a sparse feature map of comprehensive point-line features by using the feature points and the feature straight lines, and expressing and calculating the straight lines by using the Prock coordinates;

Pj_c＝R_cwPj_w+t_cw

wherein R is_cw＝R_wc ^T，t_cw＝-R_wct_wcRotation and translation, respectively, of the world coordinate system in the left camera coordinate system, [ t [ [ t ]_cw]_×Is formed by a vector t_cwA constructed 3 x 3 antisymmetric matrix; the characteristic point Pj_cThrough pinhole phaseThe model is projected to the current left camera to obtain the projected image coordinates

α and β are weighted values of point feature reprojection errors and line feature reprojection errors, α and β are solutions of motion estimation obtained by a Randac method in an optimization process in order to eliminate the influence of wrong image feature matching, wherein the weights of the point feature reprojection errors and the line feature reprojection errors are α and β;

the method comprises the steps of extracting point and line feature descriptors from visual key frames, establishing an image database, wherein the image database comprises the point and line feature descriptors in each key frame, converting the features of an image into word packet vectors according to an established visual dictionary, wherein the word packet vectors comprise TF-IDF scores of each visual vocabulary in the image, TF represents the frequency of the vocabulary entry appearing in a frame of image, IDF is the above-mentioned inverse text frequency, TF-IDF represents the product of TF and IDF, if the frequency of a visual vocabulary appearing in the same frame of image is higher, TF-IDF scores are higher, but TF-IDF scores are lower if the frequency of a visual vocabulary appearing in the whole image database is higher;

placing the dotted line characteristics, the camera motion estimation and the closed loop detection obtained in the second step to the fourth step into a key frame-based graph optimization frame, and minimizing the number of parameters of straight lines by adopting orthogonal representation of the straight lines in the optimization of the rear end; and optimizing the pose of the camera and the pose of the feature points and lines in the image optimization framework, and realizing the positioning of the camera and the construction of an online sparse visual feature map.

2. The visual simultaneous mapping and localization method of point-line integrated features as claimed in claim 1, wherein when extracting point-line features in the image, the feature point detection is performed by FAST corner detection and described by ORB descriptor, and the line feature detection is performed by LSD algorithm to extract line features and to represent lines by LBD descriptor.

3. A visual simultaneous mapping and localization method based on dotted line synthesis features as claimed in claim 1, characterized by using the prock coordinates for the calculation of the lines for the representation of the line features, including geometric transformation, three-dimensional reconstruction, and using the orthogonal representation of the lines in the optimization of the back-end to minimize the number of parameters of the lines.

4. The visual simultaneous mapping and locating method based on dotted line comprehensive features as claimed in claim 1, wherein clustering method kmeans + + is used to create an offline visual dictionary of comprehensive dotted line features for identifying and querying similar images in online process for loop detection, the dotted line features are treated differently in the visual dictionary and in the process of creating dictionary by adding flag bits, when evaluating the similarity of two images, the images are converted into word packet vectors according to the extracted features, wherein the TF-IDF scores of each visual vocabulary in the images are included, if a vocabulary appears more frequently in the same image frame, the score is higher, but the score is lower if it appears more frequently in the whole data set;