CN108986037B

CN108986037B - Monocular vision odometer positioning method and positioning system based on semi-direct method

Info

Publication number: CN108986037B
Application number: CN201810512342.XA
Authority: CN
Inventors: 刘骥; 付立宪; 唐令; 周建瓴
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2018-05-25
Filing date: 2018-05-25
Publication date: 2020-06-16
Anticipated expiration: 2038-05-25
Also published as: CN108986037A

Abstract

The invention discloses a monocular vision odometer positioning method and a positioning system based on a semi-direct method, wherein the method comprises the following steps: entering a target scene by using a monocular camera and recording scene image data; preprocessing the obtained scene image, and detecting the characteristic points; initializing a system, tracking the obtained characteristic points, taking the characteristic point pair set obtained by tracking as a data set of a robustness algorithm, extracting line segment characteristics in a scene image, and performing attitude estimation of a camera by using a semi-direct method; selecting a key frame in a camera shooting image; and (5) constructing a map. The invention realizes the initialization process with robustness and high quality. According to the method, when each three-dimensional feature is added into the depth filter for the first time, the key frame is selected to update the depth information of the three-dimensional feature for one time, so that the uncertainty of the new three-dimensional feature depth information is reduced, and the convergence of the depth is accelerated; and the reference frame with the three-dimensional characteristics is updated, the time interval between the reference frame and the current frame is shortened, and the sensitivity to illumination change is reduced.

Description

Monocular vision odometer positioning method and positioning system based on semi-direct method

Technical Field

The invention belongs to the technical field of visual mileage in three-dimensional reconstruction, and particularly relates to a monocular visual odometer positioning method and a monocular visual odometer positioning system based on a semi-direct method.

Background

With the continuous evolution of computer vision, image processing and other technologies in recent years, vision sensors are applied in more and more scenes, and the mileometers based on the vision sensors are also receiving more and more attention of researchers. Vision sensors are becoming one of the main sensors for odometry applications due to their low cost, rich information, etc. The visual odometer is more efficient and requires less hardware than SLAM-based (simultaneous localization and mapping). The traditional wheel type odometer mainly calculates the mileage through the rolling of the wheel, but the wheel may slip and idle on a rugged road surface, which brings a large error to the odometer measurement, and the odometer based on the visual sensor is not affected by the slip and idle.

The current mainstream visual odometer is classified by hardware, and can be roughly classified into a monocular system using only one camera and a binocular system using a stereo camera. Monocular means using only one camera as a unique sensor, while binocular means using one stereo camera to simultaneously capture two images at each moment. Compared with a binocular vision mileometer, the monocular vision mileometer has the advantages of low cost, no need of considering the influence of a camera baseline on the system precision and the like.

In 2014, Christian Forster proposed a visual odometer combining the features of the direct method and the feature point method, named SVO. The SVO firstly extracts FAST feature points from the image, but does not calculate descriptors of the feature points, but estimates the initial pose of a camera by using a direct method, so that a large amount of feature detection and matching time is saved, and meanwhile, the FAST feature points are only calculated, so that the speed of the FAST feature points is rapidly improved, and the method is very suitable for platforms with limited computing resources, such as unmanned aerial vehicles, smart phones and the like. However, SVO is not robust enough because it pursues extremely fast.

Disclosure of Invention

The invention aims to at least solve the technical problems in the prior art, and particularly innovatively provides a monocular vision odometer positioning method and a monocular vision odometer positioning system based on a semi-direct method.

In order to achieve the above object of the present invention, according to a first invention of the present invention, there is provided a monocular vision odometer positioning method based on a semi-direct method, comprising the steps of:

s1, entering a target scene by using an unmanned aerial vehicle or a vehicle-mounted monocular camera and recording scene image data;

s2, preprocessing the scene image obtained in the step S1, correcting distortion and detecting characteristic points of the corrected image;

s3, initializing the system, tracking the characteristic points obtained in the step S2, taking the tracked characteristic point pair set as a data set of a robustness algorithm, estimating a basic matrix F of two images by using an eight-point method, calculating a homography matrix H by using four point pairs, and selecting one of the basic matrix F and the homography matrix H for subsequent calculation by calculating the inner point pairs of the two matrixes;

s4, tracking and positioning the camera to form a motion track of the camera, extracting line segment characteristics in a scene image, and estimating the posture of the camera by using a semi-direct method, wherein the sparse images are aligned, and a photometric error is established through characteristic points on a current frame image and a previous frame image to obtain an initial camera posture; then, aligning the characteristics, projecting all map points in the map to the current frame by using the camera postures obtained in the last step, establishing photometric errors by using the gray difference of the projection points and the map points on the reference frame, and optimizing the positions of the projection points on the image; finally, a re-projection error about the three-dimensional point and the camera attitude is established by utilizing the optimized position and the optimized projection point, and the camera attitude and the three-dimensional point are optimized to obtain the final camera attitude of the current frame;

s5, selecting key frames in the camera shooting images;

s6, constructing a map, carrying out depth filtering on end points of line segment features in a key frame, newly constructing a seed point from the newly acquired key frame, carrying out one-time pre-updating on the seed point by using a plurality of key frames closest to the newly acquired key frame, adding the seed point into a seed point queue, if the seed point queue is converged, newly constructing a key frame, projecting all three-dimensional points in the map onto the new key frame, if the three-dimensional points are visible on the new key frame, changing a reference frame of the three-dimensional points into the new key frame, and if the seed point queue is not converged, continuously acquiring the key frame and updating the seed point queue.

In order to achieve the above object, according to a second aspect of the present invention, the present invention provides a monocular vision odometer positioning system based on a semi-direct method, which includes a monocular vision camera and a processor, wherein the monocular vision camera is carried by an unmanned aerial vehicle or a vehicle, the monocular vision camera acquires images and transmits the images to the processor, and the processor processes the images by using the method of the present invention, constructs a map, positions and displays a movement track of the camera on the map.

The invention adopts the visual mileage recorder to measure the mileage, thereby reducing the error of mileage measurement. The conventional wheel-type odometer mainly calculates the mileage through the rolling of the wheel, but the wheel may slip or idle on a rugged road surface, which brings a large error to the odometer measurement, and the odometer based on the visual sensor is not affected by the slip or idle.

Compared with a binocular vision mileometer and even a multi-ocular vision mileometer, the monocular vision mileometer has the advantages of low cost, no need of considering the influence of a camera baseline on the system precision and the like.

The invention realizes the initialization process with robustness and high quality. The new key frame selection strategy enhances and improves the key frame selection strategy based on scene depth, and eliminates the limitation that the key frame selection strategy can only be used for camera downward viewing.

According to the method, when each three-dimensional feature is added into the depth filter for the first time, the key frame is selected to update the depth information of the three-dimensional feature for one time, so that the uncertainty of the new three-dimensional feature depth information is reduced, and the convergence of the depth is accelerated; and the reference frame with the three-dimensional characteristics is updated, the time interval between the reference frame and the current frame is shortened, and the sensitivity to illumination change is reduced.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of a monocular visual odometer positioning method based on the semi-direct method in a preferred embodiment of the present invention;

FIG. 2 is a diagram showing the effect of distortion correction in a preferred embodiment of the present invention, wherein FIG. 2(a) is an image before correction, and FIG. 2(b) is an image before correction;

FIG. 3 is a schematic diagram of an initialization process in a preferred embodiment of the present invention;

fig. 4 is a diagram illustrating robust estimation in a preferred embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

In the description of the present invention, unless otherwise specified and limited, it is to be noted that the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.

The invention provides a monocular vision odometer positioning method based on a semi-direct method, which comprises the following steps as shown in figure 1:

s3, initializing the system, tracking the characteristic points obtained in the step S2, using the tracked characteristic point pair set as a data set of a robustness algorithm, estimating a basic matrix F of two images by using an eight-point method and calculating a homography matrix H by using four point pairs, in another preferred embodiment of the invention, calculating by using a method such as nonlinear optimization, and selecting one of the basic matrix F and the homography matrix H by calculating inner point pairs of the two matrices to perform subsequent calculation;

s4, tracking the camera, extracting line segment characteristics in a scene image, and estimating the camera posture by using a semi-direct method, wherein the sparse images are aligned, and a photometric error is established through characteristic points on a current frame image and a previous frame image to obtain a rough camera posture; then, aligning the characteristics, projecting all map points in the map to the current frame by using the camera postures obtained in the last step, establishing photometric errors by using the gray difference of the projection points and the map points on the reference frame, and optimizing the positions of the projection points on the image; finally, a re-projection error about the three-dimensional point and the camera attitude is established by utilizing the optimized position and the optimized projection point, and the camera attitude and the three-dimensional point are optimized to obtain the final camera attitude of the current frame;

s5, selecting key frames in the camera shooting images;

In the embodiment, the imaging distortion of the camera is composed of two parts of radial distortion and tangential distortion, and the distortion coefficient is a 5-dimensional vector (k)₁,k₂,k₃,p₁,p₂) To indicate. k is a radical of₃Has a small influence, and in the present invention, it does not participate in the operation. The distortion coefficients of the camera can be solved during camera calibration,the distortion coefficients are related only to the hardware of the camera, as is the camera reference. In the distortion correction process, the distorted image is known and the undistorted image is required, and it is common practice to calculate the coordinates (u, v) of the point which should be in the distorted image after distortion for a given undistorted image and then calculate the coordinates (u, v) of the distorted image of the image at this point_d,v_d) Thus, the pixel value of the undistorted image coordinate (u, v) is obtained, and the complete distortion corrected image is obtained after traversing the coordinates of all the distortion corrected images.

The distortion correction method comprises the following steps:

solving its physical coordinates (x ', y') on the image plane from the given pixel coordinates:

the physical coordinates after distortion are:

r²＝x'²+y'²

converting the distorted physical coordinates to pixel coordinates on the image:

wherein (k)₁,k₂,k₃,p₁,p₂) Is the distortion coefficient of the camera, (u, v) is the coordinates of the undistorted image, (u_d,v_d) Is the coordinates of the corresponding distorted image (u, v), (c)_x,c_y) Is the position of the camera optical center; (f)_x,f_y) Are the focal lengths of the camera in the x-direction and the y-direction.

In fig. 2, the left side shows an image before distortion correction, and the right side shows an effect image after distortion correction. The red line in the figure is a straight line in the real world, however, because of the distortion, the left figure has been seriously bent, and after the distortion is corrected, the position of the red line in the right figure is corrected to be a straight line.

In the present embodiment, the method for detecting the feature point includes:

establishing a circle by taking a pixel point in the image as the center of the circle and j as the radius, and judging whether the pixel point is a characteristic point or not according to the difference between the pixel point on the circle and the center of the circle, wherein j is a positive integer greater than 1; if the difference between the gray value of the central point p and the gray value of any continuous k pixels on the circle is larger than or smaller than the threshold, the point p is a feature point, and the value of k is j × j. In the present invention, j is preferably 3. According to the method, a large number of useless feature points are generated at the edge in the image, so that after the initial detection is completed, non-maximum suppression is performed on all detected feature points, and the best feature point in the area is selected.

In the present embodiment, as shown in fig. 3, the initialization specifically includes:

s31, determining the number of feature points in the picture, if the number of feature points exceeds a threshold (e.g. 100), determining whether there is a reference frame (i.e. the first image that satisfies the threshold exceeding the feature points), if there is a reference frame, executing step S32, if there is no reference frame, saving the picture as a reference frame, executing step S32, and when the number of feature points does not exceed the threshold, waiting for the next scene picture to be input;

s32, tracking the characteristic points, and acquiring gray level images I for two continuous time points t, t +1_t，I_t+1Let I assume_tA window w to be tracked, the positions of the pixels in the two images within the window w satisfy:

I_t(x,y)＝I_t+1(x+δx,y+δy)，

finding the matching position of the window to be tracked by minimizing the sum of squares of the gray differences of the window to be tracked between the images:

wherein e (w) is the matching position of the window to be tracked,

taylor expansion is carried out on the formula to a high-order linear term, minimum values can be obtained by respectively making partial derivatives zero, and the movement (delta x, delta y) of the window to be tracked between the images is obtained;

s33, judging the tracking effect, if the requirement of the threshold value is satisfied (the number of the feature points successfully tracked is more than 50), executing the step S34, otherwise, judging whether the interval between the frame and the reference frame exceeds 50 frames, if so, deleting the reference frame, inputting the picture again, and if not, inputting the picture again;

s34, as shown in fig. 4, randomly selecting the minimum number of data (4 pairs) required for calculating the target model from the matched feature point pairs in the two images, generating a model by using the selected minimum data, where the model is a basis matrix F estimated by using an eight-point method for the two images or a homography matrix H calculated by using four point pairs, adapting the data in the data set by using the model, and if the data is in accordance with the current model, marking the data as the inner point of the current iteration, otherwise, marking the data as the outer point, evaluating the quality of the current model by the proportion of the inner points, recording the model with the highest proportion of the inner points (the proportion of the inner points is the highest), if the iteration end condition is satisfied, ending, otherwise, returning to the beginning of the step, where the iteration end condition includes two parts: the first part is to reach the maximum number of iterations; the second part is to set an iteration number k, which is updated with each iteration, and the update formula is as follows:

wherein p is a confidence coefficient, w is a proportion of inner points under the current optimal model, m is the minimum data quantity reduced by the fitting model, and iteration is stopped when the iteration number is greater than k or any one of the maximum iteration numbers;

estimating a basic matrix F of the two images by using an eight-point method and calculating a homography matrix H by using four point pairs, wherein when the basic matrix is estimated, the minimum subset is 8 randomly selected point pairs, and when the homography matrix is estimated, the minimum subset is four randomly selected point pairs;

for the simultaneously estimated basic matrix FAnd evaluating the homography matrix H, and for the estimated model M, the score under the characteristic point pair set P can be expressed as s_M：

Wherein (p)_i,p_i') indicates a pair of feature points for which tracking was successful, d²(p, p', M) represents the symmetric transfer error of two points under model M, ρ_M(d²(p, p', M)) represents a pair d²(p, p', M) performing Chi-square test to determine whether the pair of points is an interior point pair, the number of the last interior point pair is the final score of the model, and calculating the score s of the basic matrix in this way_FAnd the score s of the homography matrix_H：

If R >0.4, then the homography matrix H is selected, otherwise the basis matrix F is selected.

In this embodiment, the step S4 specifically includes:

s41, extracting line segment characteristics of the scene image by using an LSD algorithm (a specific algorithm reference paper LSD: A line segment detector), uniformly sampling the extracted line segment characteristics, and converting the extracted line segment characteristics into point characteristics;

s42, extracting feature points of a newly arrived image frame to be used as feature points, aligning sparse images, establishing luminosity errors through the feature points on a current frame image and a previous frame image, and solving a rough camera posture; aligning features, projecting all map points in a map onto a current frame by using the camera postures obtained in the previous step, establishing luminosity errors by using the gray level difference of the projection points and the map points on a reference frame, and optimizing the positions of the projection points on the image; finally, a re-projection error about the three-dimensional point and the camera attitude is established by utilizing the optimized position and the optimized projection point, and the camera attitude and the three-dimensional point are optimized to obtain the final camera attitude of the current frame;

wherein sparse image alignment refers to:

assume that the current time is k and the current image frame is I_kThe previous time is k-1 and the previous image frame is I_k-1At this time, the relative attitude of two adjacent frames is solved, and a point p_iAs an image I_k-1The point of any one of the above characteristic points, back-projected to the three-dimensional space under the camera coordinate system thereof, is p_iA 1 is to p_iProjection to I_kThe projection point obtained above is p_i', then the photometric error is:

e＝I_k-1(p_i)-I_k(p_i')

for the line segment characteristics, carrying out uniform sampling on the line segment, and back-projecting the line segment characteristics on the previous frame image to a three-dimensional space and then projecting the line segment characteristics to the current frame;

and the luminosity error of the line segment characteristic is the sum of the luminosity errors of the sampling points, if the camera internal reference matrix is K and the camera posture of the current frame relative to the previous frame is ξ:

p_i'＝Kexp(ξ^)P_i，

solving for an increment of camera pose per iteration such that the value of the entire cost function decreases until the condition to stop the iteration is satisfied, and pre-multiplying ξ by a small perturbation δ ξ:

expanding the above equation into taylor to the first order and simplifying it can be:

in the above formula, the first and second carbon atoms are,

is a point p_iThe gradient at the point of' is,

for the derivative of the projection equation with respect to the camera pose, the chain rule:

wherein, P_i'＝exp(ξ^)P_iIf the three-dimensional point is a three-dimensional point in the camera coordinate system and the coordinates thereof are (X ', Y ', Z '), then:

while

Is P_i' derivative with respect to camera pose, according to equation:

thus:

then, in the sparse image alignment stage, the jacobian matrix using the photometric error resume cost function is:

by means of continuous iteration, the total error is reduced, the relative postures of the cameras at the previous moment and the current moment can be obtained finally, and the camera postures of the current cameras under the world coordinate system are obtained through accumulation;

the feature alignment includes: projecting all three-dimensional points visible in the current image in the map onto the image by using the camera postures obtained by sparse image alignment to obtain a series of projection points, and for the three-dimensional points P in the map_i(X, Y, Z), the key frame in which it was first observed is called its reference key frame, and the image of its reference key frame is recorded as I_refIts position in the reference frame is p_iThe three-dimensional point being visible on the image at time k and whichProjection point u_iAnd adopting the photometric error:

e＝I_ref(p_i)-I_k(u_i),

the parameter to be optimized for feature alignment is the position u of the projection point_i，

The line segment is represented by two end points in a three-dimensional space, the end points of the three-dimensional line segment are respectively projected into the current image in the characteristic alignment stage, and the alignment of the line segment characteristics can be completed by aligning the positions of the end points,

and finally, in the attitude and three-dimensional structure optimization stage, establishing a reprojection error by using the distance between the optimized position of the projection point and the non-optimized position in the feature alignment stage, optimizing the camera attitude obtained in the sparse alignment stage and the three-dimensional point position in the map, and assuming any three-dimensional point p in the map_iProjected point u of_iAssume its optimized position as u_i', then:

e＝u′_i-u_i,

for the line segment characteristics, the reprojection error is composed of two end points, and is represented by a perturbation model of lie algebra:

wherein, P_i' representing a three-dimensional Point P_iBy using three-dimensional points under a camera coordinate system, the above formula actually represents derivation of a projection equation to the camera attitude, when the three-dimensional points are optimized, how e changes along with the change of the three-dimensional points needs to be known, and the e is analyzed about the three-dimensional point P_iDerivative of' by the chain derivative rule:

assuming the current camera pose is R, t, then from the projection equation:

P_i'＝RP_i+t,

P_i' Pair P_iThe derivation is in fact a rotation matrix R,the Jacobian matrix of the reprojection error relative to the three-dimensional points is:

combining the Jacobian matrix of the reprojection error with respect to pose and the Jacobian matrix with respect to the three-dimensional points is an iterative Jacobian matrix in the pose and structure optimization process.

In this embodiment, the step S5 specifically includes:

one of the following three conditions is satisfied to begin selecting a key frame:

① when the map construction thread is idle, the method is mainly divided into two aspects, namely that the key frame queue to be processed is empty, and meanwhile, in the map construction thread, the number of three-dimensional points to be updated is small, and at the moment, the addition of key frames is considered to improve the density of the map;

② the visibility of the feature point on the key frame nearest to the current frame on the current frame is low, the key frame far from the current frame and the current frame should be overlapped in a larger scene, when this happens, it indicates that the camera is entering a new area, and the key frame needs to be added immediately to ensure the correct tracking of the camera;

③ the displacement in relative pose of the current image and the previous image exceeds a certain proportion of the average depth of the scene;

an image frame also needs to satisfy the following condition to be selected as a key frame:

① is within a certain distance from the nearest key frame, (0.5-2, this is set manually, is dependent on the data used and therefore has no fixed standard)

② in the feature alignment stage, the number of three-dimensional points of the feature points which can be re-projected on the current image is not lower than a certain value (40%) compared with the last frame;

the key frame selection algorithm is as follows:

inputting a current image frame and a previous image frame;

if the keyframe queue is empty, and if the non-converged three-dimensional point <100, then the pseudo code needKfs is true, start selecting the keyframe, projecting all three-dimensional points on the keyframe onto the current image frame, if all three-dimensional points on the current frame visible three-dimensional points/keyframe <0.5, then needKfs is true,

if the displacement/scene mean depth of the current frame is >0.1, needks ═ true,

if needks is true, the distance of the current frame from the key frame is calculated,

if the distance between the current frame and the key frame is less than the maximum distance threshold and the distance between the current frame and the key frame is greater than the minimum distance threshold, then all points in the map are projected onto the current image frame, the visibility number is m, all points in the map are projected onto the previous image frame, the visibility number is n,

if m/n > is 0.4, the current frame is a key frame.

In this embodiment, the step S6 specifically includes:

performing depth filtering on the end points of the line segment characteristics, initializing the characteristic points on the key frame into seed points and putting the seed points into a seed queue when the depth filter obtains a key frame, and updating all the seed points of the seed queue by using the image frame when the depth filter obtains a common image frame, namely updating the depth value and the uncertainty value of the depth of the seed point, wherein the seed points are uncertain in position on a new image but on a straight line (polar line);

searching epipolar lines of the feature points u of the seed points in the reference frame on the current image to obtain corresponding points u' (epipolar line matching), and taking pixels in the feature point field as templates to perform template matching during searching;

calculating depth using triangulation after obtaining the matching points; updating the original depth estimation and uncertainty by using a Bayesian probability model, adding the updated depth information into a map if the updated depth information is converged, deleting the seed point if the updated depth information is dispersed, and waiting for the next update if the updated depth information is dispersed;

and (2) deep pre-updating, namely newly building a seed point from a new key frame, pre-updating the seed point once by using a plurality of key frames closest to the new key frame, and then adding the seed point into a seed point queue, wherein the specific deep pre-updating step comprises the following steps:

acquiring a new key frame;

acquiring the average depth of a new key frame;

find the n key frames closest to the new key frame,

for feature points ft in the new keyframe:

newly building a seed point by using the characteristic point ft, and initializing the depth to be the average depth;

for a key-frame kf of the n key-frames:

carrying out epipolar line search with a new key frame;

the triangulation calculates the depth and uncertainty of the characteristic point ft, and updates the seed points based on a Bayesian probability model; the specific calculation method can adopt the method described in the paper, "REMODE: Probasilictic, monoclonal dense recovery time";

adding seed points to a seed point queue

Outputting a new seed point queue;

every time a key frame is newly built, all three-dimensional points in the map are projected onto the new key frame, if the three-dimensional points are visible on the new key frame, the reference frame of the three-dimensional points is changed into the new key frame, all the three-dimensional points in the image are projected into the latest key frame, the gray level of the projection points in the latest image and the gray level of the corresponding three-dimensional points in the reference frame are used for establishing luminosity errors, the positions of the projection points between the latest image frames are optimized, and the map is constructed.

It should be noted that the calculation method of the basis matrix F, the homography matrix H and the rotation matrix R in the present invention adopts the existing calculation method, and the acquisition method of the camera internal reference matrix K adopts the acquisition method commonly used in the art.

The invention also provides a monocular vision mileage system based on the semi-direct method, which comprises a monocular vision camera and a processor, wherein the monocular vision camera is carried by an unmanned aerial vehicle or a vehicle, the monocular vision camera acquires images and transmits the images to the processor, and the processor processes the images by using the method and constructs a map.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A monocular vision odometer positioning method based on a semi-direct method is characterized by comprising the following steps:

s4, tracking and positioning the camera, extracting line segment characteristics in a scene image, and estimating the camera posture by using a semi-direct method, wherein the sparse images are aligned, and a photometric error is established through characteristic points on a current frame image and a previous frame image to obtain a rough camera posture; then, aligning the characteristics, projecting all map points in the map to the current frame by using the camera postures obtained in the last step, establishing photometric errors by using the gray difference of the projection points and the map points on the reference frame, and optimizing the positions of the projection points on the image; finally, a re-projection error about the three-dimensional point and the camera attitude is established by utilizing the optimized position and the optimized projection point, and the camera attitude and the three-dimensional point are optimized to obtain the final camera attitude of the current frame;

s5, selecting a key frame from the camera image, specifically including:

① are within a certain distance from the nearest key frame;

② in the feature alignment stage, the number of three-dimensional points of the feature points which can be re-projected on the current image is not lower than a certain value compared with the last frame;

the key frame selection algorithm is as follows:

inputting a current image frame and a previous image frame;

if the keyframe queue is empty, and if the three-dimensional point that does not converge <100, then the pseudocode needKfs is true, start selecting the keyframe, find the closest keyframe to the current frame, project all three-dimensional points on the keyframe onto the current image frame, if the three-dimensional points visible for the current frame/all three-dimensional points on the keyframe <0.5, then needKfs is true,

if m/n > is 0.4, the current frame is a key frame;

s6, constructing a map, carrying out depth filtering on end points of line segment features in a key frame, newly constructing a seed point from the newly acquired key frame, carrying out one-time pre-updating on the seed point by using a plurality of key frames closest to the newly acquired key frame, adding the seed point into a seed point queue, if the seed point queue is converged, newly constructing a key frame, projecting all three-dimensional points in the map onto the new key frame, if the three-dimensional points are visible on the new key frame, changing a reference frame of the three-dimensional points into the new key frame, and if the seed point queue is not converged, continuously acquiring the key frame and updating the seed point queue;

the method specifically comprises the following steps:

performing depth filtering on the end points of the line segment characteristics, initializing the characteristic points on the key frame into seed points and putting the seed points into a seed queue when the depth filter obtains a key frame, updating all the seed points of the seed queue by using the image frame when the depth filter obtains a common image frame, wherein the positions of the seed points on a new image are uncertain and are on the same straight line;

searching according to the epipolar line of the feature point u of the seed point in the reference frame on the current image to obtain a corresponding point u', and taking the pixel in the feature point field as a template to perform template matching during searching;

acquiring a new key frame;

acquiring the average depth of a new key frame;

find the n key frames closest to the new key frame,

for feature points ft in the new keyframe:

for a key-frame kf of the n key-frames:

carrying out epipolar line search with a new key frame;

triangulation calculates the depth and uncertainty of the feature point ft;

updating the seed points based on a Bayesian probability model;

the seed points are added to a seed point queue,

outputting a new seed point queue;

2. The monocular visual odometer positioning method based on the semi-direct method according to claim 1, wherein the distortion correction method is:

the physical coordinates after distortion are:

r²＝x'²+y'²

3. The monocular visual odometer positioning method based on the semi-direct method according to claim 1, wherein the method for feature point detection is:

establishing a circle by taking a pixel point in the image as the center of the circle and j as the radius, and judging whether the pixel point is a characteristic point or not according to the difference between the pixel point on the circle and the center of the circle, wherein j is a positive integer greater than 1; if the difference between the gray value of the central point p and the gray value of any continuous k pixels on the circle is larger than or smaller than the threshold, the point p is a feature point, and the value of k is j × j.

4. The monocular visual odometer positioning method based on the semi-direct method according to claim 1, wherein the step S3 specifically comprises:

s31, judging the number of the feature points in the picture, if the number of the feature points exceeds the threshold value, judging whether a reference frame exists, if so, executing a step S32, if not, saving the picture as the reference frame, executing a step S32, and when the number of the feature points does not exceed the threshold value, waiting for inputting the next scene picture;

I_t(x,y)＝I_t+1(x+δx,y+δy)，

wherein e (w) is the matching position of the window to be tracked,

s33, judging the tracking effect, if the tracking effect meets the requirement, executing the step S34, otherwise judging whether the interval between the frame and the reference frame exceeds 50 frames, if so, deleting the reference frame, inputting the picture again, and if not, inputting the picture again;

s34, randomly selecting the minimum data number needed by calculating a target model from the matched feature point pairs in the two images, generating a model by using the selected minimum data, wherein the model is a basic matrix F estimated by using an eight-point method for the two images or a homography matrix H calculated by using four point pairs, adapting the data in a data set by using the model, marking the data which conform to the current model as an inner point of the iteration, otherwise marking the data as an outer point, evaluating the quality of the current model by using the proportion of the inner points, recording the model with the highest proportion of the inner points, finishing if the iteration finishing condition is met, otherwise returning to the beginning of the step, and the iteration finishing condition comprises two parts: the first part is to reach the maximum number of iterations; the second part is to set an iteration number k, which is updated with each iteration, and the update formula is as follows:

evaluating the simultaneously estimated basis matrix F and homography matrix H, and expressing the score of the estimated model M under the characteristic point pair set P as s_M：

Wherein (p)_i,p′_i) A pair of feature points representing successful tracking, d²(p, p', M) represents the symmetric transfer error of two points under model M, ρ_M(d²(p, p', M)) represents a pair d²(p, p', M) performing Chi-square test to determine whether the pair of points is an interior point pair, the number of the last interior point pair is the final score of the model, and calculating the score s of the basic matrix in this way_FAnd the score s of the homography matrix_H：

5. The monocular visual odometer positioning method based on the semi-direct method according to claim 1, wherein the step S4 specifically comprises:

s41, extracting line segment characteristics of the scene image by using an LSD algorithm, uniformly sampling the extracted line segment characteristics, and converting the line segment characteristics into point characteristics;

wherein sparse image alignment refers to:

e＝I_k-1(p_i)-I_k(p_i')

p′_i＝Kexp(ξ^)P_i，

in the above formula, the first and second carbon atoms are,

is a point p_iThe gradient at the point of' is,

while

Is P_i' derivative with respect to camera pose, according to equation:

thus:

the feature alignment includes: projecting all three-dimensional points visible in the current image in the map onto the image by using the camera postures obtained by sparse image alignment to obtain a series of projection points, and for the three-dimensional points P in the map_i(X, Y, Z), the key frame in which it was first observed is called its reference key frame, and the image of its reference key frame is recorded as I_refIts position in the reference frame is p_iThe three-dimensional point is visible on the image at time k and its projection point is u_iAnd adopting the photometric error:

e＝I_ref(p_i)-I_k(u_i),

and finally, in the attitude and three-dimensional structure optimization stage, establishing a reprojection error by using the distance between the optimized position of the projection point and the non-optimized position in the feature alignment stage, and solving the reprojection error in the sparse alignment stageIs optimized with the three-dimensional point position in the map, assuming any three-dimensional point p in the map_iProjected point u of_iAssume its optimized position to be u'_iAnd then:

e＝u′_i-u_i,

assuming the current camera pose is R, t, then from the projection equation:

P_i'＝RP_i+t,

P_i' Pair P_iThe derivation is actually a rotation matrix R, and the jacobian matrix of the reprojection error with respect to the three-dimensional points is:

6. A monocular visual odometer positioning system based on a semi-direct method, comprising a monocular visual camera carried by an unmanned plane or a vehicle, and a processor for acquiring images and transmitting the images to the processor, wherein the processor performs processing by using the method of any one of claims 1 to 5 and constructs a map.