CN106909877B - Visual simultaneous mapping and positioning method based on dotted line comprehensive characteristics - Google Patents

Visual simultaneous mapping and positioning method based on dotted line comprehensive characteristics Download PDF

Info

Publication number
CN106909877B
CN106909877B CN201611142482.XA CN201611142482A CN106909877B CN 106909877 B CN106909877 B CN 106909877B CN 201611142482 A CN201611142482 A CN 201611142482A CN 106909877 B CN106909877 B CN 106909877B
Authority
CN
China
Prior art keywords
line
feature
image
point
visual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611142482.XA
Other languages
Chinese (zh)
Other versions
CN106909877A (en
Inventor
刘勇
左星星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201611142482.XA priority Critical patent/CN106909877B/en
Publication of CN106909877A publication Critical patent/CN106909877A/en
Application granted granted Critical
Publication of CN106909877B publication Critical patent/CN106909877B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Abstract

The invention discloses a visual simultaneous mapping and positioning method based on point-line comprehensive characteristics, which comprehensively utilizes line characteristics and point characteristics extracted from binocular camera images and can be used for robot positioning and attitude estimation in indoor and outdoor environments. For parameterization of straight line features, the Prockian coordinates are used for calculation of straight lines, including geometric transformation, three-dimensional reconstruction and the like, and in optimization of the back end, the orthogonal representation of the straight lines is used for minimizing the number of parameters of the straight lines. And establishing a visual dictionary integrating the point-line characteristics in an off-line manner for closed-loop detection, and enabling the point-line characteristics to be treated differently in the visual dictionary and in establishing an image database and calculating the image similarity by a method of increasing a zone bit. The method can be used for constructing indoor and outdoor scene maps, and the constructed maps integrate the characteristic points and the characteristic straight lines and can provide richer information.

Description

Visual simultaneous mapping and positioning method based on dotted line comprehensive characteristics
Technical Field
The invention relates to the technical field of vision simultaneous mapping and positioning, in particular to the technical field of binocular vision SLAM (simultaneous positioning and mapping) based on characteristics.
Background
Aiming at the vision simultaneous modeling and positioning technology, the optimization and graph optimization based on key frames become a mainstream framework of the vision SLAM problem. Graph optimization techniques have been demonstrated to have better performance than conventional filtering frameworks in terms of computational consumed resources and consistency of results. Point features are the most widely used features in visual simultaneous mapping and localization techniques, are particularly abundant in both indoor and outdoor environments, are easily tracked in continuous image sequences, and are easily computed in geometric transformations. However, point features are more environmentally dependent, and high quality point features require robust but time-consuming feature detection and description. The line features are higher in representation level than the point features in the image, more robust information is provided in a structured environment, and the environment map and the positioning can be more efficiently and accurately established by using fewer line features and point features.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a visual SLAM method based on dotted line comprehensive characteristics, which can be used for robot positioning and attitude estimation in indoor and outdoor environments, and the system is more robust and more accurate due to the comprehensive use of the dotted line characteristics. The method can be used for constructing indoor and outdoor scene maps, the constructed maps integrate the characteristic points and the characteristic straight lines, and richer scene information can be provided. Therefore, the invention provides the following technical scheme:
a visual simultaneous mapping and positioning method based on dotted line comprehensive features is characterized by comprising the following two parts of establishing a visual dictionary offline and establishing a sparse visual feature map online:
firstly, a clustering method is utilized to establish a tree-shaped visual dictionary in an off-line way, namely a KD tree of a descriptor space, and the inverse text frequency of each node in the tree-shaped visual dictionary is determined, wherein each node is a clustering center of a descriptor:
converting the characteristics contained in each frame of image into visual words, namely characteristic descriptors; performing hierarchical clustering on the visual vocabulary, and establishing a KD tree of a description subspace, wherein the KD tree is called a visual dictionary; establishing a tree-shaped visual dictionary by using a feature descriptor offline, and extracting a training image of the feature descriptor in a centralized manner; the Descriptor is ORB (ordered FAST and indexed BRIEF), Oriented FAST corner detection and Binary Robust Independent basic feature Descriptor point feature Descriptor and LBD (Line Band Descriptor) straight Line feature Descriptor, wherein the ORB point feature Descriptor and the LBD (Line Band Descriptor) straight Line feature Descriptor are Binary descriptors, and the two Binary descriptors are respectively expanded, namely, a marker bit 0 is added to ORB point feature, a marker bit 1 is added to LBD Line feature, and the marker bit 0 and the marker bit 1 can distinguish straight Line feature from point feature;
the weight of each node in the visual dictionary is determined by the inverse text frequency (namely IDF, the main idea is that if the number of pictures containing the visual vocabulary t is less, the inverse text frequency is greater, and the vocabulary t has good category distinguishing capability);
then, establishing a sparse visual feature map on line, comprising the following steps:
acquiring a corrected image from a binocular camera, and extracting and describing features of the corrected image:
extracting point line characteristics and descriptors thereof in the corrected image, and extracting an ORB point characteristic descriptor and an LBD line characteristic descriptor on line;
secondly, performing feature matching and three-dimensional reconstruction on the corrected image in the binocular camera:
matching the feature points and the feature straight lines in the corrected image, establishing a matching pair, utilizing a binocular vision imaging model to carry out three-dimensional reconstruction on the feature points and the feature straight lines, expressing the feature straight lines by using Prock coordinates in reconstruction, maintaining the end points of the straight lines, establishing a sparse feature map of comprehensive point-line features by using the feature points and the feature straight lines, and adopting the Prock coordinates for expressing and calculating the straight lines
Step three, image matching of front and back frames, local map matching and motion estimation of a camera:
after reconstructing characteristic points and characteristic straight lines in a three-dimensional space, tracking and matching the points and the straight lines, wherein the matching comprises two parts: front and back image matching for estimating the pose of the camera at the current time, and local map matching,
the pose is solved by assuming the current time left camera coordinate system OcIn the world coordinate system OwMiddle rotation and translation are RwcAnd twcReconstructed feature point j in world coordinate system OwHas the coordinate PjwThen the feature point is in the left camera coordinate system O at the current timecCoordinates Pj of lowercComprises the following steps:
Pjc=RcwPjw+tcw
reconstructed characteristic straight line i in world coordinate system OwHas the coordinates of Liw,Liw=[nT,vT]TThen the characteristic straight line is in the left camera coordinate system O at the current timecThe following coordinates are:
Figure GDA0002162924780000031
wherein R iscw=Rwc T,tcw=-RwctwcRotation and translation, respectively, of the world coordinate system in the left camera coordinate system, [ t [ [ t ]cw]×Is formed by a vector tcwA constructed 3 x 3 antisymmetric matrix; the characteristic point PjcProjecting the image into the current left camera through the pinhole camera model to obtain the projected image coordinates
Figure GDA0002162924780000032
Subjecting a characteristic straight line LicProjecting the image to the current left camera to obtain a projection line equation of li(ii) a Defining errors of point and line characteristics respectively, the error of point being reprojection error, i.e. projection coordinate of characteristic point
Figure GDA0002162924780000033
And the observation coordinate pjA distance e betweenpj(ii) a Error of line is two end points ep1 of observation line segmenti、ep2iGeometric distance e to projection line equationli(ii) a The goal of motion estimation is to solve the following nonlinear least squares problem:
Figure GDA0002162924780000034
α and β are weight values of point feature reprojection errors and line feature reprojection errors, α and β are solutions of motion estimation obtained by a Random Sample Consensus (stochastic sampling consistency) method in an optimization process in order to eliminate the influence of wrong image feature matching by two constants;
step four, using the visual dictionary obtained in the step one to perform loop detection:
establishing an image database which comprises point and line feature descriptors extracted from visual key frames, converting the features of the images into word packet vectors according to an established visual dictionary, wherein the word packet vectors comprise TF-IDF (TF represents the frequency of the occurrence of a vocabulary entry in a frame of image, IDF is the inverse text frequency described above, and TF-IDF represents the product of TF and IDF) scores of each visual vocabulary in the images, if the frequency of the occurrence of a visual vocabulary in the same frame of image is higher, the TF-IDF scores are higher, but the TF-IDF scores are lower if the frequency of the occurrence of the visual vocabulary in the whole image database is higher;
when the similarity of two images is evaluated, the images are converted into word packet vectors according to the extracted features, then the similarity score is calculated according to the word packet vectors, the images acquired by the current camera are compared with the images in an image database, the images acquired at the same position with higher score are a closed loop, the position is shown to be visited before, the geometric consistency is utilized, namely, enough matching pairs in the two images support Euclidean transformation, the time consistency is realized, namely, a plurality of image sequences before and after the two images are similar, and whether the closed loop is formed is further judged;
and step five, putting the point-line characteristics, the camera motion estimation, the closed-loop detection and the like obtained in the step two to the step four into a key frame-based graph optimization frame, and minimizing the number of parameters of straight lines by adopting orthogonal representation of the straight lines in the optimization of the rear end. And optimizing the pose of the camera and the pose of the feature points and lines in the image optimization framework, and realizing the positioning of the camera and the construction of an online sparse visual feature map.
On the basis of the technical scheme, the invention can also adopt the following further technical scheme:
when the point-line characteristics in the image are extracted, FAST corner point detection is adopted for the detection of the characteristic points, an ORB descriptor is used for description, and an LSD algorithm is adopted for the detection of the line characteristics to extract the line characteristics, and an LBD descriptor is used for representing the line.
For the representation of the straight line characteristics, the Prockian coordinates are used for calculation of the straight line, the geometrical transformation and the three-dimensional reconstruction are included, and the orthogonal representation of the straight line is used in the optimization of the back end to minimize the parameter number of the straight line.
Establishing an offline visual dictionary integrating point-line characteristics by using a clustering method kmeans + + (K mean + + clustering method), wherein the offline visual dictionary is used for identifying and querying similar images in an online process to perform loop detection, enabling the point-line characteristics to be treated differently in the visual dictionary and when an image database is established by adding a marker bit in the process of establishing the dictionary, and converting the images into word packet vectors according to the extracted characteristics when the similarity of the two images is evaluated, wherein the word packet vectors comprise TF-IDF scores of each visual word in the images, and if the frequency of occurrence of a word in the same frame of image is higher, the score is higher, but the score is lower if the frequency of occurrence of a word in the whole data set is higher;
some characteristic parts v in word packet vectori pSum line characteristic part vi l(ii) a Two word bag vectors v1,v2The similarity of (a) is defined as:
Figure GDA0002162924780000051
wherein a and b are weight values of the point feature score and the line feature score, are two constants, and satisfy a + b ═ 1.
Due to the adoption of the technical scheme, the invention has the beneficial effects that: the visual dictionary provided by the invention is trained by using various large data sets to achieve a good clustering effect, and the visual dictionary can be repeatedly used after being built; the method can estimate the pose of the camera at the current moment by using fewer features as much as possible, and the local map matching involves more features, so that a more accurate solution can be obtained.
Drawings
FIG. 1 is a visual dictionary model of the comprehensive point-line characteristics established based on a clustering method according to the present invention;
FIG. 2 is a schematic representation of the Prock coordinates of a characteristic line of the present invention;
FIG. 3 is a diagram illustrating the selection of the end points of the wireless long straight line in the space of the present invention;
FIG. 4 is a reprojection error model of a characteristic line of the present invention;
FIG. 5 is a diagram model established by the present invention using front-end derived point-line features, camera motion estimation, closed-loop detection, etc.
Detailed Description
For a better understanding of the technical solution of the present invention, it is further described below with reference to the accompanying drawings.
Establishing a visual dictionary by using a clustering method in an off-line manner, and determining the inverse text frequency (IDF) of a node:
in order to judge whether the camera repeatedly visits the same area, the characteristics contained in each frame of image are converted into visual words. These visual words correspond to a discretized descriptor space, called the visual dictionary. As shown in fig. 1, a large number of feature descriptors are used to establish a tree dictionary offline, the feature descriptors are extracted from a large number of training image sets, and the process of establishing the tree dictionary is also a process of continuously clustering by using a Kmeans + + algorithm. The descriptors here are ORB point descriptors and LBD straight line descriptors. Because they are 256-bit binary descriptors, they can be put in the same visual dictionary, which can simplify the process of creating visual dictionary and the operation of loop detection. In general, there are many point features and few line features in the image, so the point line features are treated differently in the visual dictionary. Two 256-bit binary descriptors are expanded respectively: and adding a flag bit 0 to the ORB point characteristics and adding a flag bit 1 to the LSD line. Therefore, the mark bits can be used for distinguishing the linear features from the point features, and the point features and the line features are also distinguished when an image database is established on line, the image similarity is compared and the like. Fig. 1 is a visual dictionary model based on a clustering method and integrating point-line features. The visual dictionary should be trained with a large number of diverse data sets to achieve a good clustering effect, and the visual dictionary can be reused after being built. The weight of each node in the visual dictionary is determined by the inverse text frequency (IDF) of all the feature descriptors contained in that node.
IDF=log(N/ni)
Where N is the number of all images in the dataset, NiThe number of images that contain the feature represented by the node.
Visual SLAM of online synthetic dotted line features main steps:
step one, acquiring a corrected image from a binocular camera, and extracting and describing features of the image
And extracting the point and line characteristics and descriptors thereof in the binocular camera image. The detection of the characteristic points adopts FAST corner detection and is described by ORB descriptors. They are very fast to calculate and match, while having rotational invariance to viewing angle. The detection of the straight line features adopts an LSD (line segment detection) algorithm to extract the straight line features and uses an LBD (line band descriptor) descriptor to represent the straight line. The ORB descriptor and the LBD descriptor are 256-bit binary descriptors, and the storage structures are the same, so that convenience is provided for establishing an offline dictionary integrating dotted line characteristics, querying an image database and the like. This step is the same as extracting features and descriptors in the process of establishing a visual dictionary offline.
Step two, matching left and right image features and three-dimensional reconstruction
When the left image and the right image are matched, the feature points in the right image and the middle points of the feature straight lines are projected to the left image. Because the image is corrected, only a feature with the minimum Hamming distance from the feature of the right image needs to be searched in a rectangular window in the left image, and the feature is the feature matched with the feature of the right image. And sorting the Hamming distances, adaptively selecting a threshold value, and rejecting some matching pairs with larger distances to ensure the matching accuracy.
Three-dimensional reconstruction of feature points:
for the corrected image, it is assumed that the points of the matching points in the left and right images are m ═ u, respectively1v]TAnd m ═ u2v]TThe coordinate of the three-dimensional point M determined by M and M' in the left camera coordinate system is [ X Y Z]TThen, there are:
Figure GDA0002162924780000071
Figure GDA0002162924780000072
Figure GDA0002162924780000073
wherein B, f, ucAnd vcThe parameters of the binocular stereo vision system after image correction are shown, B is the base line distance of the binocular camera, f is the focal length of the camera, ucvc]TIs the pixel coordinate of the intersection of the optical axis and the image plane, d ═ u1-u2Is the disparity of the matching points, which reflects the depth of the three-dimensional point.
Three-dimensional reconstruction of characteristic straight lines:
it is obviously not suitable to represent a straight line with two three-dimensional end points, because the change of the viewing angle and some obstacles make it very difficult to extract and track the end points of the straight line from the image. Therefore, it is most appropriate to represent a three-dimensional line in space as a line of infinite length. As shown in fig. 2, the prock coordinates are used for calculation of straight lines, including geometric transformation, three-dimensional reconstruction, etc., and the orthogonal representation of the straight lines is used for optimization of the back end.
In the case of three-dimensional reconstruction of a straight line, for efficient geometric transformation and calculation, the prock coordinates L ═ n are usedT,vT]TTo represent a straight line, as shown in fig. 2, where n is the normal vector of the plane pi formed by the straight line and the camera origin Oc, and v is the direction vector of the straight line L. The prock coordinate has a constraint n perpendicular to v, i.e., nxv ═ 0. The projection of the line L in space in the image plane is a line L, and the corresponding line end point A, B is projected as points a, b. In the camera coordinate system OcIn (1),c=KC,d=KD,n=C×D,lc×dwhereinc,d,lIs a homogeneous coordinate representation, and x is the outer product, i.e., cross product. K is a parameter matrix in the camera,
Figure GDA0002162924780000081
can derive a straight line l in the image plane, satisfying l ═ det (K) K-Tn is the same as the formula (I). Suppose left camera centerThe plane formed by the space straight line L is pilThe plane formed by the center of the right camera and the space straight line L is pirThe intersection line of the two planes is a spatial straight line. Plane pilExpressed as:
π l=Pl T l l∈R4
whereinl lIs the image of the spatial straight line in the left camera image plane. PlIs a projection matrix with the left camera,
Pl=Kl[I|0]
Kli is an intrinsic parameter matrix of the left camera, I is a 3 × 3 identity matrix, and 0 is a 3 × 1 zero matrix. Similarly, the plane formed by the center of the right camera and the space straight line L can be obtained by the external parameters of the camera, and the plane is pirIs represented by homogeneous coordinatesπ r. The intersection line of the two planes is a space straight line L, and the dual Prock matrix of the L is expressed as
Figure GDA0002162924780000082
The relationship between the dual prock matrix and the prock coordinate representation is:
Figure GDA0002162924780000091
the prock coordinates can be obtained by using the above formula.
The above is the three-dimensional reconstruction of the characteristic straight line, and in addition, since the scene map is to be built, the spatial straight line L is infinitely long, and for convenience of display, the spatial straight line needs to be cut, that is, two end points C, D of the maintenance straight line need to be maintained. The selection of the end point C, D on the spatial straight line L can be determined according to a certain rule by the imaging L of the spatial straight line L in the left camera image planelThe end points of (a) are obtained by geometric transformation, and as shown in fig. 3, the end points of the straight line are selected as a schematic diagram. In the figure, e is a straight line l perpendicular to l in the left camera image planecThe distance between points e-c can be set to any value. The plane pi' is determined by the straight line ec and the camera center OcOf the plane of (a). The end point C can be obtained by truncating the straight line L in space by the plane pi'. Similarly, an endpoint D may be obtained. In the process of camera movement, the end points c and d of the imaging L of the same spatial straight line L on the left camera image plane are not fixed, so that the cut C, D is also different, and only C, D points with the largest distance are selected as the end points of the straight line maintained in the space.
Step three, matching the characteristics of the front and the rear images and estimating the motion of the camera
Figure GDA0002162924780000092
After the left and right images are subjected to feature matching and three-dimensional reconstruction, a three-dimensional coordinate Pj of a feature point j and a feature straight line i in a world coordinate system is obtainedwAnd the Prock coordinates Li of characteristic straight lineswThe projection of the characteristic point j in the left image at the current moment can be obtained after the front image and the rear image are matched
Figure GDA0002162924780000093
Projection l of a characteristic line in the left image at the current timei. Suppose the current time left camera coordinate system OcIn the world coordinate system OwMiddle rotation and translation are RwcAnd twcThen the feature point is in the left camera coordinate system O at the current timecThe coordinates of lower are Pjc=RcwPjw+tcw. Left camera coordinate system O of characteristic straight line i at current momentcThe coordinates of where R iscw=Rwc T,tcw=-RwctwcRotation and translation, respectively, of the world coordinate system in the left camera coordinate system. [ t ] ofcw]×Is formed by a vector tcwA 3 x 3 antisymmetric matrix is formed. The characteristic point PjcProjecting the image into the current left camera through the pinhole camera model to obtain the projected image coordinates
Figure GDA0002162924780000101
Subjecting a characteristic straight line LicProjecting the image to the current left camera image to obtain a projection linear equation of li. Defining errors of the point and line features, respectively, the error of the point beingReprojection errors, i.e. feature point projection coordinates
Figure GDA0002162924780000102
And the observation coordinate pjA distance e betweenpj
Figure GDA0002162924780000103
Error of line is two end points ep1 of observation line segmenti、ep2iGeometric distance e to projection line equationli
Figure GDA0002162924780000104
Wherein ep1i=[ep1i1ep1i21]TIs endpoint ep1iIs equal to the homogeneous coordinate representation of ep2iIs endpoint ep2iHomogeneous coordinate representation of lc=[lc1lc2lc3]TIs the equation of a straight line lcThe coefficients of (a) constitute a vector.
The goal of motion estimation is to solve the following nonlinear least squares problem:
Figure GDA0002162924780000105
α and β are weighted values of point feature reprojection errors and line feature reprojection errors, are two constants which can be set according to experience, and in order to eliminate the influence of wrong image feature matching, a Randac method can be adopted to obtain a better solution of motion estimation in the step.
Step four, using the off-line trained vision dictionary to make the loop detection
And establishing an image database by using the point-line feature descriptors extracted from the visual key frames. Calculating distances between all descriptors extracted from the key frame and the clustering center as the node in the visual dictionary, and selecting one layer in the dictionary treeAs a comparison level (typically 4-6 levels are chosen), all extracted descriptors are subdivided into the nodes of the dictionary tree that are closest to it in that level. According to the division condition, the image can be dispersed into a word packet vector, the dimension of the word packet vector is the number of the comparison layer nodes, the word packet vector comprises TF-IDF scores of each visual word in the image, and a point characteristic part v in the word packet vectori pSum line characteristic part vi l. The score is higher if a word appears more frequently in the same frame of image, but the score is lower if it appears more frequently in the entire data set. TF-IDF is:
TF-IDF=IDF*(niIt/nIt)
niItto be in an image ItThe number of the visual words in (1), nItAs an image ItThe IDF is the inverse text frequency of the visual vocabulary in the established off-line visual dictionary.
The newly generated word packet vector is then compared with the word packet vectors in the image database to perform similarity judgment. Two word bag vectors v1,v2The similarity of (a) is defined as:
Figure GDA0002162924780000111
where a and b are weight values of the point feature score and the line feature score, are two constants, and satisfy a + b ═ 1, which can be set empirically. The loop detection is performed according to the similarity, so that false detection can occur, and other information needs to be assisted. Images in the database that are close in time generally get similar scores. By utilizing the characteristic, images close in time sequence are grouped, and scores are compared by taking a group as a unit, wherein the score of the image group is the sum of the scores of each frame of image in the group. The score of each frame of image must be above a certain threshold to be added to the score of the group of images. Upon searching the entire image database, the group with the highest grouping score is selected, and the image with the highest single frame image score is considered to be the pending closed-loop image. And finally, obtaining the closed-loop image pair by using strategies such as geometric verification (comparing all characteristic points in the image), time consistency (similarity exists between images in front and back time periods of the closed-loop image pair) and the like.
Step five, putting the point-line characteristics, camera motion estimation, closed-loop detection and the like obtained at the front end into a key frame-based image optimization frame
And modeling an objective function to be optimized, namely an error model of the point characteristic and the line characteristic and a closed-loop detection error model. This is a nonlinear Optimization problem, and Graph models can be built and then iteratively optimized using sparsity calls for the open source tools g2o (General Graph Optimization), gtsam (georgia Tech Smoothing and mapping), Ceressolver, etc. Graph Optimization tools to solve the problem. And finally obtaining the optimized camera position and posture and points and straight lines in the space.
Error model of point features:
suppose the current time i left camera coordinate system OcIn the world coordinate system OwMiddle rotation and translation are RwcAnd twcIs provided with
Figure GDA0002162924780000121
Reconstructed feature point j in world coordinate system OwHas the coordinate of PwjThen the feature point is in the left camera coordinate system O at the current timecThe following coordinates are:
Pij=RcwPwj+tcw
Pij=[xijyijzij]T
pijprojection by camera projection model into left camera image with image coordinates of
Figure GDA0002162924780000122
Where π is the projection equation:
Figure GDA0002162924780000123
wherein f isx,fyIs the focal length in the longitudinal and transverse directions of the camera (u)c,vc) The imaging origin of the camera is the camera internal reference.
The reprojection error of the point is defined as the projection coordinate of the characteristic point
Figure GDA0002162924780000124
And the observation coordinate pijA distance e betweenij
Figure GDA0002162924780000125
Error model of line features:
reconstructed characteristic straight line k in world coordinate system OwHas the coordinates of Lwk,Lwk=[nT,vT]TThen the characteristic straight line is in the left camera coordinate system O at the current timecThe following coordinates are:
Figure GDA0002162924780000126
as shown in fig. 4, a characteristic straight line L is drawnikProjected to the current left camera image to obtain a projection linear equation of
Figure GDA0002162924780000131
Figure GDA0002162924780000132
The projection line of the line L on the left camera image plane is
Figure GDA0002162924780000133
And the observation line segment is lik. Let observation line segment likTo the projection line of
Figure GDA0002162924780000134
Distance d ofl1,dl2Set as the error function:
Figure GDA0002162924780000135
wherein a ═ a1a21]TIs the homogeneous coordinate of the endpoint a, b ═ b1b21]TIs the homogeneous coordinate of the end point b,
Figure GDA0002162924780000136
is a linear equation
Figure GDA0002162924780000137
The coefficients of (a) constitute a vector.
In the back-end optimization process, in order to minimize the number of parameters of the straight line and prevent over-parameterization, the straight line is parameterized by adopting an orthogonal representation method (U, W) epsilon SO3 multiplied by SO2, wherein SO3 is a three-dimensional orthogonal rotation matrix, SO2 is a two-dimensional orthogonal rotation matrix, and the degrees of freedom are respectively 3 and 1.
Figure GDA0002162924780000138
Order to
Figure GDA0002162924780000139
Here, a minimum of four parameters are used
Figure GDA00021629247800001310
Where theta is a 3 x I vector,
Figure GDA00021629247800001311
is a scalar. Can pass through U*←R(θ)U,
Figure GDA00021629247800001312
To update U, W ∈ SO3 × SO 2.
Closed-loop constrained error model:
suppose the position and attitude x of the camera at a certain momentiDetecting that the position i is the same as the position i' which has already been walked by using a closed loop detection method, namely finding a pair of closed loops xiAnd xi′Generating a closed loop constraintCl. The error of the closed-loop constraint is ec=xi-g(xi′,Cl). And the function g is a function for calculating the position and the attitude of the other moment in the closed-loop matching pair according to the position and the attitude of one moment in the closed-loop matching pair and closed-loop constraint.
Taking the feature points and the feature straight lines as road signs l, and the position and the attitude x of the camera and the road signs l as nodes in the graph model as loop detection ClAnd the observation Z of the binocular camera as an edge, a graph model is established, as shown in fig. 5. The problem to be solved by graph optimization is to constantly optimize the variables l, X when u, Z, c are known, so that the known u, Z, c are taken as the observation Z and the variables l, X are taken as the state X. The problem to be solved by the graph optimization model is to maximize joint probability and obtain l*、x*
Figure GDA0002162924780000141
Figure GDA0002162924780000142
Since it is assumed that the observation Z is in state Xi,XjThe observed error between is e0(Xi,Xj) I.e. the four errors mentioned above. Assuming all errors obey a covariance of
Figure GDA0002162924780000143
A Gaussian distribution of
Figure GDA0002162924780000144
Taking the negative logarithm of the above equation, the objective function of the graph optimization model will become:
Figure GDA0002162924780000145
the problem is a nonlinear Optimization problem, and can be solved in a graph Optimization framework by methods such as Gauss Newton, LM (Levenberg-Marquardt Optimization), Dogleg (method proposed by Powell) and the like.

Claims (4)

1. A visual simultaneous mapping and positioning method based on dotted line comprehensive features is characterized by comprising the following two parts of establishing a visual dictionary offline and establishing a sparse visual feature map online:
firstly, a clustering method is utilized to establish a tree-shaped visual dictionary in an off-line way, namely a KD tree of a descriptor space, and the inverse text frequency of each node in the tree-shaped visual dictionary is determined, wherein each node is a clustering center of a descriptor:
converting the characteristics contained in each frame of image into visual words, namely characteristic descriptors; performing hierarchical clustering on the visual vocabulary, and establishing a KD tree of a description subspace, wherein the KD tree is called a visual dictionary; establishing a tree-shaped visual dictionary by using a feature descriptor offline, and extracting a training image of the feature descriptor in a centralized manner; the descriptor is an ORB, an oriented fast corner detection and binary robust independent basic feature descriptor point feature descriptor and an LBD straight line feature descriptor; the ORB point feature descriptor and the LBD straight line feature descriptor are both binary descriptors, and the two binary descriptors are respectively expanded: adding a zone bit 0 to ORB point characteristics and adding a zone bit 1 to LBD line characteristics, wherein the zone bit 0 and the zone bit 1 can distinguish straight line characteristics from point characteristics; before obtaining the LBD straight line feature descriptor, firstly, an LSD is used for detecting a straight line, and then the LBD descriptor is used for describing the straight line;
the weight of each node in the visual dictionary is determined by the inverse text frequency of all feature descriptors contained in the node;
then, establishing a sparse visual feature map on line, comprising the following steps:
acquiring a corrected image from a binocular camera, and extracting and describing features of the corrected image:
extracting point line characteristics and descriptors thereof in the corrected image, and extracting an ORB point characteristic descriptor and an LBD line characteristic descriptor on line;
secondly, performing feature matching and three-dimensional reconstruction on the corrected image in the binocular camera:
matching the feature points and the feature straight lines in the corrected image, establishing a matching pair, performing three-dimensional reconstruction on the feature points and the feature straight lines by using a binocular vision imaging model, expressing the feature straight lines by using Prock coordinates in reconstruction, maintaining the end points of the straight lines, establishing a sparse feature map of comprehensive point-line features by using the feature points and the feature straight lines, and expressing and calculating the straight lines by using the Prock coordinates;
step three, image matching of front and back frames, local map matching and motion estimation of a camera:
after reconstructing characteristic points and characteristic straight lines in a three-dimensional space, tracking and matching the points and the straight lines, wherein the matching comprises two parts: front and back image matching for estimating the pose of the camera at the current time, and local map matching,
the pose is solved by assuming the current time left camera coordinate system OcIn the world coordinate system OwMiddle rotation and translation are RwcAnd twcReconstructed feature point j in world coordinate system OwHas the coordinate PjwThen the feature point is in the left camera coordinate system O at the current timecCoordinates Pj of lowercComprises the following steps:
Pjc=RcwPjw+tcw
reconstructed characteristic straight line i in world coordinate system OwHas the coordinates of Liw,Liw=[nT,vT]TThen the characteristic straight line is in the left camera coordinate system O at the current timecThe following coordinates are:
Figure FDA0002162924770000021
wherein R iscw=Rwc T,tcw=-RwctwcRotation and translation, respectively, of the world coordinate system in the left camera coordinate system, [ t [ [ t ]cw]×Is formed by a vector tcwA constructed 3 x 3 antisymmetric matrix; the characteristic point PjcThrough pinhole phaseThe model is projected to the current left camera to obtain the projected image coordinates
Figure FDA0002162924770000022
Subjecting a characteristic straight line LicProjecting the image to the current left camera to obtain a projection line equation of li(ii) a Defining errors of point and line characteristics respectively, the error of point being reprojection error, i.e. projection coordinate of characteristic point
Figure FDA0002162924770000023
And the observation coordinate pjA distance e betweenpj(ii) a Error of line is two end points ep1 of observation line segmenti、ep2iGeometric distance e to projection line equationli(ii) a The goal of motion estimation is to solve the following nonlinear least squares problem:
Figure FDA0002162924770000024
α and β are weighted values of point feature reprojection errors and line feature reprojection errors, α and β are solutions of motion estimation obtained by a Randac method in an optimization process in order to eliminate the influence of wrong image feature matching, wherein the weights of the point feature reprojection errors and the line feature reprojection errors are α and β;
step four, using the visual dictionary obtained in the step one to perform loop detection:
the method comprises the steps of extracting point and line feature descriptors from visual key frames, establishing an image database, wherein the image database comprises the point and line feature descriptors in each key frame, converting the features of an image into word packet vectors according to an established visual dictionary, wherein the word packet vectors comprise TF-IDF scores of each visual vocabulary in the image, TF represents the frequency of the vocabulary entry appearing in a frame of image, IDF is the above-mentioned inverse text frequency, TF-IDF represents the product of TF and IDF, if the frequency of a visual vocabulary appearing in the same frame of image is higher, TF-IDF scores are higher, but TF-IDF scores are lower if the frequency of a visual vocabulary appearing in the whole image database is higher;
when the similarity of two images is evaluated, the images are converted into word packet vectors according to the extracted features, then the similarity score is calculated according to the word packet vectors, the images acquired by the current camera are compared with the images in an image database, the images acquired at the same position with higher score are a closed loop, the position is shown to be visited before, the geometric consistency is utilized, namely, enough matching pairs in the two images support Euclidean transformation, the time consistency is realized, namely, a plurality of image sequences before and after the two images are similar, and whether the closed loop is formed is further judged;
placing the dotted line characteristics, the camera motion estimation and the closed loop detection obtained in the second step to the fourth step into a key frame-based graph optimization frame, and minimizing the number of parameters of straight lines by adopting orthogonal representation of the straight lines in the optimization of the rear end; and optimizing the pose of the camera and the pose of the feature points and lines in the image optimization framework, and realizing the positioning of the camera and the construction of an online sparse visual feature map.
2. The visual simultaneous mapping and localization method of point-line integrated features as claimed in claim 1, wherein when extracting point-line features in the image, the feature point detection is performed by FAST corner detection and described by ORB descriptor, and the line feature detection is performed by LSD algorithm to extract line features and to represent lines by LBD descriptor.
3. A visual simultaneous mapping and localization method based on dotted line synthesis features as claimed in claim 1, characterized by using the prock coordinates for the calculation of the lines for the representation of the line features, including geometric transformation, three-dimensional reconstruction, and using the orthogonal representation of the lines in the optimization of the back-end to minimize the number of parameters of the lines.
4. The visual simultaneous mapping and locating method based on dotted line comprehensive features as claimed in claim 1, wherein clustering method kmeans + + is used to create an offline visual dictionary of comprehensive dotted line features for identifying and querying similar images in online process for loop detection, the dotted line features are treated differently in the visual dictionary and in the process of creating dictionary by adding flag bits, when evaluating the similarity of two images, the images are converted into word packet vectors according to the extracted features, wherein the TF-IDF scores of each visual vocabulary in the images are included, if a vocabulary appears more frequently in the same image frame, the score is higher, but the score is lower if it appears more frequently in the whole data set;
some characteristic parts v in word packet vectori pSum line characteristic part vi l(ii) a Two word bag vectors v1,v2The similarity of (a) is defined as:
Figure FDA0002162924770000041
wherein a and b are weight values of the point feature score and the line feature score, are two constants, and satisfy a + b ═ 1.
CN201611142482.XA 2016-12-13 2016-12-13 Visual simultaneous mapping and positioning method based on dotted line comprehensive characteristics Active CN106909877B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611142482.XA CN106909877B (en) 2016-12-13 2016-12-13 Visual simultaneous mapping and positioning method based on dotted line comprehensive characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611142482.XA CN106909877B (en) 2016-12-13 2016-12-13 Visual simultaneous mapping and positioning method based on dotted line comprehensive characteristics

Publications (2)

Publication Number Publication Date
CN106909877A CN106909877A (en) 2017-06-30
CN106909877B true CN106909877B (en) 2020-04-14

Family

ID=59206482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611142482.XA Active CN106909877B (en) 2016-12-13 2016-12-13 Visual simultaneous mapping and positioning method based on dotted line comprehensive characteristics

Country Status (1)

Country Link
CN (1) CN106909877B (en)

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107392964B (en) * 2017-07-07 2019-09-17 武汉大学 The indoor SLAM method combined based on indoor characteristic point and structure lines
CN107329490B (en) * 2017-07-21 2020-10-09 歌尔科技有限公司 Unmanned aerial vehicle obstacle avoidance method and unmanned aerial vehicle
CN107752910A (en) * 2017-09-08 2018-03-06 珠海格力电器股份有限公司 Region cleaning method, device, storage medium, processor and sweeping robot
CN107680133A (en) * 2017-09-15 2018-02-09 重庆邮电大学 A kind of mobile robot visual SLAM methods based on improvement closed loop detection algorithm
WO2019057179A1 (en) 2017-09-22 2019-03-28 华为技术有限公司 Visual slam method and apparatus based on point and line characteristic
CN109558879A (en) * 2017-09-22 2019-04-02 华为技术有限公司 A kind of vision SLAM method and apparatus based on dotted line feature
CN107885224A (en) * 2017-11-06 2018-04-06 北京韦加无人机科技股份有限公司 Unmanned plane barrier-avoiding method based on tri-item stereo vision
CN107869989B (en) * 2017-11-06 2020-02-07 东北大学 Positioning method and system based on visual inertial navigation information fusion
CN107784671B (en) * 2017-12-01 2021-01-29 驭势科技(北京)有限公司 Method and system for visual instant positioning and drawing
CN108090959B (en) * 2017-12-07 2021-09-10 中煤航测遥感集团有限公司 Indoor and outdoor integrated modeling method and device
CN108230337B (en) * 2017-12-31 2020-07-03 厦门大学 Semantic SLAM system implementation method based on mobile terminal
CN108107897B (en) * 2018-01-11 2021-04-16 驭势科技(北京)有限公司 Real-time sensor control method and device
CN108363387B (en) * 2018-01-11 2021-04-16 驭势科技(北京)有限公司 Sensor control method and device
CN110399892B (en) * 2018-04-24 2022-12-02 北京京东尚科信息技术有限公司 Environmental feature extraction method and device
CN108682027A (en) * 2018-05-11 2018-10-19 北京华捷艾米科技有限公司 VSLAM realization method and systems based on point, line Fusion Features
CN108961322B (en) * 2018-05-18 2021-08-10 辽宁工程技术大学 Mismatching elimination method suitable for landing sequence images
CN108921896B (en) * 2018-06-15 2021-04-30 浙江大学 Downward vision compass integrating dotted line characteristics
CN109074676B (en) * 2018-07-03 2023-07-07 达闼机器人股份有限公司 Method for establishing map, positioning method, terminal and computer readable storage medium
CN109101981B (en) * 2018-07-19 2021-08-24 东南大学 Loop detection method based on global image stripe code in streetscape scene
CN109034237B (en) * 2018-07-20 2021-09-17 杭州电子科技大学 Loop detection method based on convolutional neural network signposts and sequence search
CN109165680B (en) * 2018-08-01 2022-07-26 东南大学 Single-target object dictionary model improvement method in indoor scene based on visual SLAM
CN109166149B (en) * 2018-08-13 2021-04-02 武汉大学 Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU
CN109409418B (en) * 2018-09-29 2022-04-15 中山大学 Loop detection method based on bag-of-words model
CN109493385A (en) * 2018-10-08 2019-03-19 上海大学 Autonomic positioning method in a kind of mobile robot room of combination scene point line feature
CN109752003B (en) * 2018-12-26 2021-03-02 浙江大学 Robot vision inertia point-line characteristic positioning method and device
CN110033514B (en) * 2019-04-03 2021-05-28 西安交通大学 Reconstruction method based on point-line characteristic rapid fusion
CN111830517B (en) * 2019-04-17 2023-08-01 北京地平线机器人技术研发有限公司 Method and device for adjusting laser radar scanning range and electronic equipment
CN110375732A (en) * 2019-07-22 2019-10-25 中国人民解放军国防科技大学 Monocular camera pose measurement method based on inertial measurement unit and point line characteristics
CN110473258B (en) * 2019-07-24 2022-05-13 西北工业大学 Monocular SLAM system initialization algorithm based on point-line unified framework
CN110455301A (en) * 2019-08-01 2019-11-15 河北工业大学 A kind of dynamic scene SLAM method based on Inertial Measurement Unit
CN111076733B (en) * 2019-12-10 2022-06-14 亿嘉和科技股份有限公司 Robot indoor map building method and system based on vision and laser slam
CN111310772B (en) * 2020-03-16 2023-04-21 上海交通大学 Point line characteristic selection method and system for binocular vision SLAM
CN111899334B (en) * 2020-07-28 2023-04-18 北京科技大学 Visual synchronous positioning and map building method and device based on point-line characteristics
CN112085790A (en) * 2020-08-14 2020-12-15 香港理工大学深圳研究院 Point-line combined multi-camera visual SLAM method, equipment and storage medium
CN112115980A (en) * 2020-08-25 2020-12-22 西北工业大学 Binocular vision odometer design method based on optical flow tracking and point line feature matching
CN112507778B (en) * 2020-10-16 2022-10-04 天津大学 Loop detection method of improved bag-of-words model based on line characteristics
CN113298014B (en) * 2021-06-09 2021-12-17 安徽工程大学 Closed loop detection method, storage medium and equipment based on reverse index key frame selection strategy
CN113393524B (en) * 2021-06-18 2023-09-26 常州大学 Target pose estimation method combining deep learning and contour point cloud reconstruction
CN113514067A (en) * 2021-06-24 2021-10-19 上海大学 Mobile robot positioning method based on point-line characteristics
CN113432593B (en) * 2021-06-25 2023-05-23 北京华捷艾米科技有限公司 Centralized synchronous positioning and map construction method, device and system
CN113450412B (en) * 2021-07-15 2022-06-03 北京理工大学 Visual SLAM method based on linear features
CN113532431A (en) * 2021-07-15 2021-10-22 贵州电网有限责任公司 Visual inertia SLAM method for power inspection and operation
CN114789446A (en) * 2022-05-27 2022-07-26 平安普惠企业管理有限公司 Robot pose estimation method, device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012192090A (en) * 2011-03-17 2012-10-11 Kao Corp Information processing method, method for estimating orbitale, method for calculating frankfurt plane, and information processor
CN102855649A (en) * 2012-08-23 2013-01-02 山东电力集团公司电力科学研究院 Method for splicing high-definition image panorama of high-pressure rod tower on basis of ORB (Object Request Broker) feature point
CN102967297A (en) * 2012-11-23 2013-03-13 浙江大学 Space-movable visual sensor array system and image information fusion method
CN104639932A (en) * 2014-12-12 2015-05-20 浙江大学 Free stereoscopic display content generating method based on self-adaptive blocking
CN104915949A (en) * 2015-04-08 2015-09-16 华中科技大学 Image matching algorithm of bonding point characteristic and line characteristic
CN106022304A (en) * 2016-06-03 2016-10-12 浙江大学 Binocular camera-based real time human sitting posture condition detection method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012192090A (en) * 2011-03-17 2012-10-11 Kao Corp Information processing method, method for estimating orbitale, method for calculating frankfurt plane, and information processor
CN102855649A (en) * 2012-08-23 2013-01-02 山东电力集团公司电力科学研究院 Method for splicing high-definition image panorama of high-pressure rod tower on basis of ORB (Object Request Broker) feature point
CN102967297A (en) * 2012-11-23 2013-03-13 浙江大学 Space-movable visual sensor array system and image information fusion method
CN104639932A (en) * 2014-12-12 2015-05-20 浙江大学 Free stereoscopic display content generating method based on self-adaptive blocking
CN104915949A (en) * 2015-04-08 2015-09-16 华中科技大学 Image matching algorithm of bonding point characteristic and line characteristic
CN106022304A (en) * 2016-06-03 2016-10-12 浙江大学 Binocular camera-based real time human sitting posture condition detection method

Also Published As

Publication number Publication date
CN106909877A (en) 2017-06-30

Similar Documents

Publication Publication Date Title
CN106909877B (en) Visual simultaneous mapping and positioning method based on dotted line comprehensive characteristics
CN110335319B (en) Semantic-driven camera positioning and map reconstruction method and system
Gálvez-López et al. Real-time monocular object slam
Wang et al. Sketch-based 3d shape retrieval using convolutional neural networks
Khan et al. IBuILD: Incremental bag of binary words for appearance based loop closure detection
US9449392B2 (en) Estimator training method and pose estimating method using depth image
CN107796397A (en) A kind of Robot Binocular Vision localization method, device and storage medium
Mei et al. Closing loops without places
CN111462210B (en) Monocular line feature map construction method based on epipolar constraint
CN110717927A (en) Indoor robot motion estimation method based on deep learning and visual inertial fusion
CN111311708B (en) Visual SLAM method based on semantic optical flow and inverse depth filtering
CN112562081B (en) Visual map construction method for visual layered positioning
CN110119768B (en) Visual information fusion system and method for vehicle positioning
CN112200915A (en) Front and back deformation amount detection method based on target three-dimensional model texture image
Tardós et al. Real-time monocular object SLAM
Lu et al. Large-scale tracking for images with few textures
SANDOVAL et al. Robust sphere detection in unorganized 3D point clouds using an efficient Hough voting scheme based on sliding voxels
CN113570713B (en) Semantic map construction method and device for dynamic environment
CN110930519B (en) Semantic ORB-SLAM sensing method and device based on environment understanding
CN115330861A (en) Repositioning algorithm based on object plane common representation and semantic descriptor matching
CN113888603A (en) Loop detection and visual SLAM method based on optical flow tracking and feature matching
Jaenal et al. Unsupervised appearance map abstraction for indoor visual place recognition with mobile robots
Tomoya et al. Change detection under global viewpoint uncertainty
Moura et al. VEM-SLAM-Virtual environment modelling through SLAM
Loncomilla et al. Visual SLAM based on rigid-body 3D landmarks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant