CN105809687B

CN105809687B - A kind of monocular vision ranging method based on point information in edge in image

Info

Publication number: CN105809687B
Application number: CN201610131438.2A
Authority: CN
Inventors: 程农; 杨盛; 李清; 田振
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2016-03-08
Filing date: 2016-03-08
Publication date: 2019-09-27
Anticipated expiration: 2036-03-08
Also published as: CN105809687A

Abstract

The present invention relates to a kind of methods based on the unmanned plane monocular vision ranging of edge point information in image, belong to Navigation of Pilotless Aircraft field of locating technology, this method comprises: lower depending on selecting two frames to construct initial map and initial depth figure in monocular camera captured image sequence from being connected with unmanned plane, first key frame first frame being taken as in map, and the corresponding camera coordinates system of first frame is taken as world coordinate system, complete initialization；Carry out estimation, map structuring and depth map parallel again and estimate three threads: estimation thread utilizes known map and depth map information, it is aligned to obtain ranging result with present frame, and existing cartographic information is optimized according to ranging result, map structuring and depth map estimation thread are run simultaneously to safeguard map and depth map information.The present invention makes full use of the multicore architecture of modern processors, and the edge point information that can be effectively utilized in image improves efficiency of algorithm in conjunction with angle point information, has stronger adaptability.

Description

Monocular vision ranging method based on edge point information in image

Technical Field

The invention belongs to the technical field of unmanned aerial vehicle navigation and positioning, and particularly relates to a monocular vision range finding method utilizing edge point information in an image.

Background

In many application scenarios, people cannot rely on GPS for accurate positioning, such as indoors, between urban buildings, or in jungle, valley or even outside celestial bodies. Navigating with vision is very intuitive: all animals with eyes in nature use vision to navigate in a certain form, insects such as flies and bees judge the relative motion between the insects and a target through light flow, and people judge the position of the body through the seen scenery in most of time.

On the premise of not using a GPS, most of the micro unmanned aerial vehicles adopt a navigation scheme combining an inertial measurement unit and a single camera due to the consideration of cost and weight, and can also be provided with an ultrasonic sensor and an air pressure sensor to acquire absolute height information. A drone of this configuration can achieve completely autonomous waypoint tracking.

The process of estimating the motion of a carrier carrying a single or a plurality of cameras by only using the image input of the carrier is called visual odometry VO (visual odometry). The application fields include robots, augmented reality, automatic driving and the like. The process is similar to the conventional round of measurement. Wheel range is calculated by accumulating wheel rotation, and visual range is used for incrementally estimating the pose of the carrier by sensing the change of an input image. The effective operation of the visual ranging algorithm requires sufficient illumination in the environment and sufficient scene texture.

The monocular vision measuring range only uses a single camera as input, the system configuration is simple, and the capability of adapting to environmental scale change is stronger than that of a monocular vision system. In the existing monocular visual ranging method, the matching between frames is generally carried out by utilizing the corner features in the image, and the method cannot adapt to scenes lacking the corner features (the corner features are important features of the image and represent pixel points highlighted in some attributes in the image. in the scene in FIG. 1, the corner features are mostly present at the intersection points of straight lines, the quantity is rare, the repeatability exists on the structure, and the existing monocular visual ranging algorithm based on the corner features is likely to fail). In addition, the angular point features only occupy a small part of all pixel points of the image, and the monocular vision measuring range performance can be improved from the viewpoint of the utilization rate of image information.

The terms to which the present invention relates are described below:

frame: in the field of visual ranging, it is customary to call an obtained image as one frame, for example, an image obtained at the previous moment of the camera is called a previous frame, an image obtained at the current moment of the camera is called a current frame, and two consecutive images obtained by the camera are called adjacent frames, etc.;

key frame: because the frame rate of the current camera is high, the pose change between adjacent frames is often small, in order to enhance the accuracy of motion estimation, a key frame strategy is generally adopted, that is, in a certain pose change range, a newly obtained image is only aligned with a certain specific frame to estimate the current pose, and only after a certain range is exceeded, a new specific frame is adopted to perform image alignment of the next stage, that is, the specific frames used for performing image alignment are called as key frames;

reference frame: the frame used to align the current picture is called the reference frame of the current picture;

map: in the field of visual ranging, known environmental information (e.g., the positions of points that have been calculated, images that have been acquired, etc.) is stored and referred to as a map. The map can be used as prior information of subsequent image matching and motion estimation to increase the accuracy of the measuring range.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a monocular vision range finding method based on edge point information in an image, which can effectively utilize the edge point information in the image and has stronger adaptability to scenes lacking corner features by combining corner point information (as assistance).

The invention provides an unmanned aerial vehicle monocular vision range finding method based on edge point information in an image, which is characterized by comprising the steps of initializing and carrying out parallel processing on motion estimation, map construction and depth map estimation; the method specifically comprises the following steps:

1) initialization: selecting two frames from an image sequence captured by a downward-looking monocular camera fixedly connected with an unmanned aerial vehicle to construct an initial map and an initial depth map, wherein the map uses a group of key frame sets { r }_mAnd a set of three-dimensional feature point sets p_iDenotes, where r denotes a key frame, subscript m denotes an mth key frame, m is a positive integer, p denotes a three-dimensional feature point, subscript i denotes an ith three-dimensional feature point, i is a positive integer, the three-dimensional feature point corresponds to a corner feature in an image, and its corresponding two-dimensional corner set is denoted as a two-dimensional corner setWherein u is_cRepresents a corner point, subscript i represents the ith corner point, i is a positive integer; the depth map is composed of edge point coordinates, edge point depth values and uncertainty of depth values, the depth map is in one-to-one correspondence with the key frame, and the depth map is usedRepresenting, wherein D represents a depth map corresponding to the edge points, the superscript m represents the mth depth map which is a positive integer, u_eRepresenting edge points, d_eDepth values, σ, representing edge points_eThe uncertainty of the depth reciprocal of the edge point is shown, and a subscript i represents the ith edge point and is a positive integer;

taking the first frame as a first key frame in the map, taking a camera coordinate system corresponding to the first frame as a world coordinate system, and finishing initialization;

2) after map and depth map information is initialized, three threads of motion estimation, map construction and depth map estimation are performed in parallel: the motion estimation thread aligns with the current frame by using the known map and depth map information to obtain a ranging result, the existing map information is optimized according to the ranging result, and the map construction and depth map estimation thread simultaneously operate to maintain the map and depth map information.

The invention has the characteristics and beneficial effects that:

the monocular vision ranging method based on the edge point information in the image takes an image captured by a downward-looking monocular camera fixedly connected with an unmanned aerial vehicle as input, and outputs a ranging result of the unmanned aerial vehicle through on-board computer operation; the method is realized by three parallel threads, namely a motion estimation thread, a map construction thread and a depth map estimation thread; the motion estimation thread aligns the current motion pose with the current frame by using the existing information in the map and the depth map, is the core of a ranging algorithm and must ensure strong real-time performance; the map construction thread is used for maintaining the key frame information and the three-dimensional feature map corresponding to the image corner features; the depth map estimation thread is used to maintain a depth estimate of edge point information in the key frame.

The invention respectively runs the motion estimation, the map construction and the depth map estimation in three threads, fully utilizes the multi-core architecture of the modern processor and improves the algorithm efficiency. The motion estimation algorithm has no processes of feature extraction and matching, a rapid feature point extraction algorithm is adopted during map construction, and pixel points with large brightness gradient are directly extracted in depth map estimation, so that the calculated amount is effectively reduced. Meanwhile, in the initial estimation process of the motion estimation algorithm, the edge point information in the image is utilized, and a depth map propagation method of the edge point is correspondingly designed. The edge points in the image are image features which are much richer than the corner points, and the algorithm can reduce the dependence of the algorithm on the corner point features by adopting edge point information in the initial estimation and enhance the environment adaptability of the algorithm.

Drawings

FIG. 1 is a scene in which a monocular visual ranging method based on corner features may fail;

FIG. 2 is a general flow chart of the improved monocular vision ranging method based on edge points according to the present invention;

FIG. 3 is a flow chart of a motion estimation thread according to an embodiment of the present invention;

FIG. 4 is a flow diagram of a mapping thread according to an embodiment of the invention;

FIG. 5 is a flowchart of a depth map estimation thread according to an embodiment of the present invention;

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and examples.

The general flow of the unmanned aerial vehicle monocular vision ranging method based on the edge point information in the image is shown in fig. 2, and is characterized by comprising the steps of initializing and performing parallel motion estimation, map construction and depth map estimation; the method specifically comprises the following steps:

1) initialization: two frames are selected from an image sequence captured by a downward-looking monocular camera fixedly connected with an unmanned aerial vehicle to construct an initial map and an initial depth map, and in the embodiment, the map uses a group of key frame sets { r_mAnd a set of three-dimensional feature point sets p_iWhere r denotes a key frame, subscript m denotes an mth key frame, which is a positive integer, p denotes a three-dimensional feature point, subscript i denotes an ith three-dimensional feature point, which is a positive integer, the three-dimensional feature point corresponds to a corner feature in an image, and its corresponding two-dimensional corner set is generally denoted as a two-dimensional corner setWherein u is_cRepresents a corner point, and the subscript i represents the ith corner point, which is a positive integer;

in this embodiment, a depth map corresponding to each key frame in the map is calculated, where the depth map is composed of edge point coordinates, edge point depth values, and uncertainty of the depth values, that is, the depth map is usedWherein D represents the depth map corresponding to the edge points, the superscript m represents the mth depth map, which is a positive integer (the depth map corresponds to the key frame one by one), and u_eRepresenting edge points, d_eDepth values, σ, representing edge points_eDenotes the uncertainty of the reciprocal depth of the edge point, and the subscript i denotes the ith edge point, which is a positive integer.

The first frame is taken as a first key frame in the map, and the camera coordinate system corresponding to the first frame is taken as a world coordinate system (the world coordinate system refers to a coordinate system with a definite transformation relation relative to a coordinate system of the real world such as a coordinate system of the northeast, and the camera coordinate system corresponding to the first frame is selected as the world coordinate system for convenience of calculation), so that initialization is completed;

2) after map and depth map information is initialized, three threads of motion estimation, map construction and depth map estimation are performed in parallel: the method comprises the steps that a motion estimation thread aligns current frames by utilizing known map and depth map information to obtain a range finding result, existing map information is optimized according to the range finding result, and a map construction thread and a depth map estimation thread run simultaneously to maintain the map and depth map information; the three threads are described in detail as follows:

21) motion estimation thread: for a current frame image newly acquired by a camera, carrying out image alignment based on a motion model by using edge point depth information in a current key frame to obtain a pose T of the current frame in a world coordinate system_k，wInitial estimation ofWherein T represents a pose, subscript k represents that the current frame is a kth frame image obtained by a camera and is a positive integer, subscript w represents a world coordinate system, and superscript-represents initial estimation, and the initial estimation can be utilized to enable the existing three-dimensional feature points { p } in the map to be_iIs projected in the current image and is matched with p by block matching_iThe corresponding two-dimensional corner pointsTo be provided withIn order to obtain a measured value,as an initial value, successively comparing the poses T of the current frame in the world coordinate system w_k，wAnd optimizing the position of the relevant three-dimensional characteristic point in the map. The embodiment of motion estimation is shown in fig. 3, and includes:

s11: recording the current frame image as I_kWherein I denotes camera acquisitionImage, first initializing the current pose of the drone to the pose at the previous moment, i.e. the pose at the previous momentWith the current key frame r_mAs a reference frame, minimizing I by using a Gauss-Newton iterative algorithm_kAnd r_mThe weighted gray scale residuals between them are as follows (1-1):

obtaining a current frame I_kRelative to the reference frame r_mPosition and posture T_k，mWhereinRepresents the reference frame r_mThe median inverse depth uncertainty is less than a set uncertainty threshold (1/200 for the maximum estimate of the inverse depth) and is visible at the current frame as a set of edge points u_iRepresenting pixel points in the image, wherein u represents a pixel point, subscript i represents the ith pixel point which is a positive integer, w_i(u_i) Representing a pixel u_iThe weight of (A) is taken asσ_iUncertainty of depth of the ith pixel point, pixel point u_iGray residual δ R (T)_k，m，u_i) Is represented by the formula (1-2):

wherein pi is a projection function, and projecting three-dimensional points in the camera coordinate system into the two-dimensional image coordinate system^-1In order to be the inverse of the projection function,andrespectively representing pixel points in frame I_kAnd frame r_mIn conjunction with the reference frame r_mPose T relative to world coordinate system_m，wThen an initial estimate of the pose of the current frame can be calculatedIs represented by the formula (1-3):

if the number of the edge points participating in the weighting is less than the set edge point threshold (the value is 300 in the embodiment), turning to step S12, otherwise, directly entering step S13;

s12: using the pose estimation obtained in the step S11 as an initial value, and taking the previous frame I_k-1Performing image alignment as a reference frame of the current frame, and minimizing I by using a Gauss-Newton iterative optimization algorithm_kAnd I_k-1The gray residual between the previous frames obtains the pose T of the current frame relative to the previous frame_k，k-1As shown in formulas (1-4):

whereinRepresenting a set of corner points (projected from three-dimensional feature points in the map) in frame k-1, where the depth is known and visible in the current frame. Combining the pose T of the previous frame relative to the world coordinate system_k-1，wThen an initial estimate of the pose of the current frame can be calculatedIs represented by the formula (1-5):

at this time, the current frame I_kSelecting a new key frame as a reference frame of a subsequent segment of image; simultaneously, the information is used as the input information of a map construction thread and a depth map estimation thread;

s13: (initial estimation of pose using current frameThree-dimensional feature points in the map can be projected to the current frame to obtain a set of two-dimensional coordinates, but because of the fact thatAnd errors exist, and the two-dimensional coordinates are not necessarily the real positions of the three-dimensional characteristic points imaged in the current frame. ) From initial estimatesJudging a three-dimensional feature point set { p) observed by the current frame in the map_iGet it at the reference frame r_mEstimate of the coordinates of the corresponding corner pointsTo be provided withFor the initial value, { p ] is calculated by using image tracking algorithm KLT (Kanade-Lucas-Tomasi Feature Tracker, a commonly used open algorithm)_iTrue imaging position at current frame

Wherein A is_iRepresenting a transformation matrix for transforming the reference tile to the current image.Representing the average gray difference of the reference and matching blocks in the current image for eliminating the illumination effect (this step establishes the three-dimensional feature points { p } in the map_iAnd two-dimensional corner points in the imageThe sub-pixel precision of);

s14: the two-dimensional coordinates obtained in step S13 and the initial pose estimate of the current frameThe geometric projection constraint, i.e. equation (1-7), is no longer satisfied:

to be provided withFor the initial value, by minimizing the reprojection error pair T_k，wUpdating the pose T of the current frame_k，wI.e. the final result of the motion estimation, as in equations (1-8):

T_k，wthe final result obtained by measuring the distance is obtained. Turning to step S15, the current frame pose T obtained in step S14 is used_k，wRespectively turning a map construction thread and a depth map estimation thread to calculate corner depth values and edge depth values;

s15: utilizing the current frame pose T obtained in the step S14_k，wOptimizing each three-dimensional feature point in the map which can be observed by the current frame respectively, and enabling the sum of squares of errors of projection and real imaging positions of the three-dimensional feature point in all key frames which can observe the three-dimensional feature point to be minimum, namely, the three-dimensional feature point is expressed as a formula (1-9):

in the above formula, the superscript j indicates that the three-dimensional feature point p can be observed_iThe jth key frame of (1);

22) the map construction thread is processed in two situations: if the newly acquired image is selected as a new key frame in the motion estimation thread step S12, executing step S21, extracting corner features from the newly added key frame, and creating a probability-based depth filter for each newly extracted corner by the algorithm (i.e., the depth of a corner is composed of a depth value and an uncertainty, and the depth value and the uncertainty of the corner are updated by the depth filter for each new observation); when a newly acquired image is not selected as a key frame, performing steps S22 to S24 using the pose of the current frame obtained in the motion estimation thread step S14, updating a probability depth filter of a corner feature using information of a new image, and adding a corresponding three-dimensional feature point to a map when the depth uncertainty of a certain corner is smaller than a set uncertainty threshold, where an embodiment of the thread is shown in fig. 4, and the specific steps are described as follows: firstly, judging whether a new image is a key frame, if so, executing a step S21, otherwise, executing steps S22-S24;

s21: when a frame of image is selected as a key frame, extracting corner Features of the key frame by using a FAST corner extraction algorithm (FAST corner extraction algorithm); the image is first homogenized using an equally spaced grid (e.g. 30x30 pixels)Dividing, wherein only one angular point is extracted from each grid, so that the angular points are uniformly distributed in the image, and the total number does not exceed the number of the grids; for each new corner pointCreating a probability depth filter and inverting the depth(whereinRepresenting the inverse of the depth of a corner, the subscript i representing the ith corner in the current key frame, which is a positive integer), and the probability of occurrence ρ of a valid observation for that corner_i(where ρ represents the effective observation probability of a corner, and subscript i represents the ith corner in the current key frame, which is a positive integer), the joint posterior distribution is defined as formula (2-1):

wherein Beta and N respectively represent Beta distribution and normal distribution, subscript N represents that parameter updating has been performed N times on the current probability depth filter, and the parameter a is a positive integer_nAnd b_nIs the number of times, mu, that valid observation and invalid observation occur respectively in the recursive updating process_nAndthe mean and variance of the inverse depth gaussian distribution; when the parameters in the formula (2-1) are initialized, a is_nAnd b_nHas an initial value of 10, mu_nInitialisation is to the inverse of the mean depth of the image,initialized to a maximum value.

S22: application stepsThe image I with known camera pose obtained in step S14_kUpdating the depth estimation of the corner feature; corner point for recording updateAt key frame r_mAccording to the current frame I_kAnd r_mRelative position and attitude of_kDetermine a straight line and ensure I_kNeutralization ofCorresponding pixel pointOccurs on this straight line; a search is performed on this line (this process is commonly referred to as epipolar search), resulting in sub-pixel precision block matchingAfter the coordinates of (a) are obtained, the corner point is calculated by triangulation (a standard calculation method in stereovision)Depth of (2)

S23: the depth obtained in step S22Take the reciprocal ofWherein, the subscript i represents the ith corner in the current key frame and is a positive integer, the superscript n represents that the ith corner is the nth measurement update of the depth filter of the current corner and is a positive integer, and the subscript n represents the inverse of the depthIs modeled as in equation (2-2):

the meaning of the above formula is: key frame r_mImage I of the ith corner point_kIf the measurement is valid, the reciprocal of the measurement resultSatisfy normal distributionWhereinCalculating the variance for the inverse depth due to one pixel error in the image plane, and if it is an invalid measurementSatisfy uniform distributionWhereinThe minimum value and the maximum value of the depth reciprocal are set according to the prior information of the current scene; throughMeasurement update, inverse depthThe posterior probability distribution of (2) to (3):

wherein C is a constant that ensures the normalization of the probability distribution. By means of moment matching (statistical methods, approximating one distribution to another) it is possible to calculate

S24: rejecting effective measurement probabilitiesToo low (less than 0.1) corner points will not determine a degree of uncertainty σ_nSatisfies the requirement (sigma)_n1/200 less than the maximum estimate of the reciprocal depth of the corner point) is added to the map;

23) the depth map estimation thread processes in two cases: if the newly acquired image is selected as a new key frame in the motion estimation thread step S12, extracting edge points from the newly added key frame, and executing steps S31 to S33; when the newly input image is not selected as the key frame, the pose of the current frame obtained in the step S14 of the motion estimation thread is utilized to execute the steps S34-S36, the probability depth filter of the edge point is updated by new image information, and the edge point with the too low effective observation probability is removed; the execution steps are shown in fig. 5 and described in detail as follows: firstly, judging whether a new image is a key frame, if so, executing steps S31-S33, otherwise, executing steps S34-S36;

s31: in the newly added key frame, Sobel operator (a general method for edge point detection) is used for calculating horizontal and vertical gray gradients G of the image_xAnd G_yFor grey scale gradient mapsAnd (4) approximation. Selecting 2500 to 4000 pixel points with the largest gray gradient from the image, reducing the gradient threshold value 450 to be 0.95 times of the original value when the number of the pixel points with obvious gradient is less than 2500, and increasing the threshold value to be 1.05 times of the original value when the number is more than 4000;

s32: remember the key frame as r_mThe previous key frame is r_m-1This step converts the frame r_m-1Depth map D of^m-1Propagation to new key frame depth map D^m；r_m-1Middle, edge pointHas a mean of inverse depths ofThe variance of the reciprocal of depth isWherein,represents the mean of the inverse depths of the edge points,denotes the variance of the inverse depth, the index i denotes the frame r_m-1The ith edge point of (1). r is_m-1To r_mRelative position and posture of T_m，m-1Edge pointIn frame r_mPosition of corresponding edge point inCan be calculated from the projection relation to obtain the formula (3-1):

note the bookHas an inverse depth ofThe variance of the reciprocal depth is noted asThen:

wherein, t_zRepresenting translation of the camera on the optical axis. From this it can be deducedVariance of inverse depth ofComprises the following steps:

whereinRepresenting the prediction uncertainty, set empirically. Due to the error in the projection of the light,may not be an integer, willIs bound to r_mMiddle nearest pixel point

S33: creating a probability for each edge point extracted in step S31Depth filter, if the 3x3 pixel range near an edge point is stored with a prior estimateThen its depthMean value ofInitializing as equation (3-4):

uncertainty of inverse depth of parameter of probability depth filterThen it is initialized to the minimum uncertainty in the a priori estimate, i.e.If no prior estimate exists, the probabilistic depth filter is initialized to mean depth and maximum uncertainty; getThen for each edge point the probability depth filter can be constructed according to equation (2-1) as in equation (3-5):

wherein each parameter of the formula (3-5) is defined similarly to the formula (2-1), where each parameter of the probability depth filter corresponding to an edge point, a_nAnd b_nHas an initial value of 10, mu_n，σ_nIs initialized toAnd

s34: new key frame r_mSetting the reference frame as the subsequent image, and utilizing the subsequent image to frame r_mDepth map D of (1)_mPerforming measurement updating; using images I_kTo D^mWhen updating, the edge points of the update are recorded asAccording to I_kAnd r_mRelative position and attitude of_kDetermine a straight line, ensure I_kNeutralization ofCorresponding pixel pointOccurs on this straight line; if the included angle between the edge point gray scale gradient direction and the linear direction is less than the threshold (the threshold is set to be 25 degrees in the embodiment), the sub-pixel precision block matching is performed to obtain the edge point gray scale gradient imageAfter the coordinates of (2), calculating the depth of the edge point by triangulationPerforming step 35); if the included angle is larger than the threshold value, no search is performed, and the step S34 is returned to check the next edge point.

S35: the depth obtained in step S34Take the reciprocal ofWherein, the subscript i represents the ith edge point in the current key frame and is a positive integer, the superscript n represents the nth measurement update of the depth filter of the current edge point and is a positive integer, and the subscript i represents the inverse of the depthIs modeled as in equation (3-6):

the meaning of the above formula is: key frame r_mThe ith edge point of the image I_kIf the measurement is valid, the reciprocal of the measurement resultSatisfy normal distributionIf it is an invalid measurement, thenSatisfy uniform distributionWhereinThe minimum value and the maximum value of the depth reciprocal are set according to the prior information of the current scene; throughMeasurement update, inverse depthThe posterior probability distribution of (a) is as follows (3-6):

calculated by a moment matching method

S36 rejection of the low probability of valid measurementsLess than 0.1).

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention. And the scope of the present invention is not limited thereto, and those skilled in the art will be able to make various changes, modifications, alterations, and substitutions on the embodiments without departing from the spirit, principle, and scope of the present invention, and therefore all equivalent technical means should fall within the scope of the present invention, and the scope of the present invention should be limited only by the appended claims and their equivalents.

Claims

1. A method for measuring distance of monocular vision of an unmanned aerial vehicle based on edge point information in an image is characterized by comprising the steps of motion estimation, map construction and depth map estimation of initialization and parallel processing; the method specifically comprises the following steps:

1) initialization: selecting two frames from an image sequence captured by a downward-looking monocular camera fixedly connected with an unmanned aerial vehicle to construct an initial map and an initial depth map, wherein the map uses a group of key frame sets { r }_mAnd a set of three-dimensional feature point sets p_iDenotes wherein r is_mRepresenting the mth key frame, m being a positive integer, p_iRepresenting the ith three-dimensional feature point, wherein i is a positive integer, the three-dimensional feature points correspond to the two-dimensional corner features in the image one by one, and the two-dimensional corner set is recorded asWhereinRepresenting the ith angular point, wherein i is a positive integer; the depth map is composed of edge point coordinates, edge point depth values and uncertainty of depth reciprocal, the depth map is in one-to-one correspondence with the key frame, and the depth map is usedIs shown in which D^mRepresenting the depth map corresponding to the mth key frame,which represents the point of the i-th edge,a depth value representing the ith edge point,representing the uncertainty of the depth reciprocal of the ith edge point;

taking a first frame as a first key frame in a map, taking a camera coordinate system corresponding to the first frame as a world coordinate system, and finishing initialization;

2) after map and depth map information is initialized, three threads of motion estimation, map construction and depth map estimation are performed in parallel: the method comprises the steps that a motion estimation thread aligns current frames newly acquired by a camera by using known map and depth map information to obtain a range finding result, existing map information is optimized according to the range finding result, and the map construction and depth map estimation thread runs simultaneously to maintain the map and depth map information.

2. The method as claimed in claim 1, wherein the step 2) of performing the motion estimation thread in parallel among the three threads of motion estimation, map construction and depth map estimation comprises the following specific steps:

carrying out image alignment based on a motion model on a current frame image newly acquired by a camera by utilizing edge point depth information in a current key frame to obtain a current frame I_kPose T in world coordinate system_k，wInitial estimation ofWherein I_kRepresenting the image of the k-th frame acquired by the camera, i.e. the current frame, k being a positive integer, T_k，wRepresents the pose of the current frame in the world coordinate system,the initial estimation is utilized to project the existing three-dimensional characteristic points in the map into the current image to obtain a three-dimensional characteristic point set { p ] which can be observed by the current frame_iGet the result of block matching with { p }_iThe corresponding two-dimensional corner setTo be provided withIn order to be able to take the value of the observation,as an initial value, successively comparing the poses T of the current frame in the world coordinate system w_k，wOptimizing the position of a relevant three-dimensional characteristic point in the map;

s11: recording the current frame image as I_kFirstly, the current pose of the unmanned aerial vehicle is initialized to the pose at the previous moment, namelyWith the current key frame r_mAs a reference frame, minimizing I by using a Gauss-Newton iterative algorithm_kAnd r_mThe weighted gray scale residuals between them are as follows (1-1):

obtaining a current frame I_kRelative to the reference frame r_mPosition and posture T_k，mWhereinRepresents the reference frame r_mThe middle depth reciprocal uncertainty is less than a set uncertainty threshold and is visible in the current frame at the set of edge points,indicating the point of the i-th visible edge,to representThe weight of (A) is taken as Representing the uncertainty of the inverse depth of the ith edge point, edge pointGray scale residual ofIs represented by the formula (1-2):

if the number of the edge points participating in the weighting is less than the set edge point threshold, turning to the step S12, otherwise, directly entering the step S13;

whereinIndicating that the depth is known in frame k-1, and that the set of corners visible in the current frame,to representThe ith point; combining the pose T of the previous frame relative to the world coordinate system_k-1，wThen calculating the initial estimate of the pose of the current frameIs represented by the formula (1-5):

s13: from initial estimatesJudging a three-dimensional feature point set { p) observed by the current frame in the map_iGet it at the reference frame r_mEstimate of the coordinates of the corresponding corner pointsTo be provided withFor initial values, p is calculated by the image tracking algorithm KLT_iUpdating the two-dimensional corner set at the real imaging position of the current frame

Wherein A is_iA transformation matrix representing a transformation of the reference tile to the current image;representing the average gray difference of the reference image block and the matching image block in the current image, and eliminating the illumination influence;

s14: the two-dimensional coordinates obtained in step S13 and the initial pose estimate of the current frameThe formula (1-7) is no longer satisfied:

T_k，wthe final result obtained by measuring the range is obtained; turning to step S15, the current frame pose T obtained in step S14 is used_k，wCalculating corner point depth values and edge point depth values as input information of a map building thread and a depth map estimation thread;

s15: utilizing the current frame pose obtained in the step S14T_k，wOptimizing each three-dimensional feature point in the map which can be observed by the current frame respectively, and enabling the sum of squares of errors of projection and real imaging positions of the three-dimensional feature point in all key frames which can observe the three-dimensional feature point to be minimum, namely, the three-dimensional feature point is expressed as a formula (1-9):

in the above formula, the superscript j indicates that the three-dimensional feature point p can be observed_iThe jth key frame of (1).

3. The method as claimed in claim 2, wherein the step 2) of performing the mapping thread in the three threads of motion estimation, mapping and depth map estimation in parallel comprises the following specific steps:

the map building thread handles in two cases: if the newly acquired image is selected as a new key frame in the motion estimation thread step S12, executing step S21, otherwise, executing steps S22-S24 by using the pose of the current frame obtained in the motion estimation thread step S14; the method comprises the following specific steps:

s21: when one frame of image is selected as a key frame, extracting corner features of the key frame by using a FAST algorithm; firstly, uniformly dividing an image by using grids at equal intervals, and only extracting one angular point from each grid so that the angular points are uniformly distributed in the image, wherein the total number of the angular points does not exceed the number of the grids; for each new corner pointCreating a probability depth filter and inverting the depthAnd probability of occurrence of valid observations ρ_iThe combined posterior distribution of (A) is defined as formula (2-1):

wherein: beta and N respectively represent Beta distribution and normal distribution, subscript N represents that N times of parameter updating has been carried out on the current probability depth filter, N is a positive integer, and parameter a_nAnd b_nIs the number of times, mu, that valid observation and invalid observation occur respectively in the recursive updating process_nAndthe mean and variance of the inverse depth gaussian distribution; when the parameters in the formula (2-1) are initialized, a is_nAnd b_nHas an initial value of 10, mu_nInitialisation is to the inverse of the mean depth of the image,initializing to a maximum value;

s22: using the image I with known camera pose obtained in step S14_kUpdating the depth estimation of the corner feature; corner point for recording updateAt key frame r_cAccording to the current frame I_kAnd r_cRelative position and attitude of_kDetermine a straight line and ensure I_kNeutralization ofCorresponding pixel pointOccurs on this straight line; performing epipolar line search on the straight line, and obtaining the epipolar line through sub-pixel precision block matchingAfter the coordinates of (2), the corner point is calculated by triangulationDepth of (2)

S23: the depth obtained in step S22Take the reciprocal ofWhere the superscript n indicates that this is the nth observation update to the current corner depth filter, the inverse of the depthIs modeled as in equation (2-2):

the meaning of the above formula is: key frame r_cImage I of the ith corner point_kIf the measurement is valid, the reciprocal of the measurement resultSatisfy normal distributionWhereinCalculating the variance for the inverse depth due to one pixel error in the image plane, and if it is an invalid measurementSatisfy uniform distributionWhereinThe minimum value and the maximum value of the depth reciprocal are set according to the prior information of the current scene; throughUpdated by observation of (2), inverse depthThe posterior probability distribution of (2) to (3):

wherein C is a constant for ensuring the probability distribution normalization, and a is calculated by a moment matching method_n，b_n，μ_n，

S24: rejecting effective measurement probabilitiesCorner points less than 0.1 will have uncertainty σ_nPoints that meet the requirements are added to the map.

4. The method as claimed in claim 3, wherein the step 2) of depth map estimation thread for performing three threads of motion estimation, map construction and depth map estimation in parallel comprises the following specific steps:

if the newly acquired image is selected as a new key frame in the motion estimation thread step S12, extracting edge points from the newly added key frame, and executing steps S31 to S33; otherwise, executing the steps S34-S36 by utilizing the pose of the current frame obtained in the motion estimation thread step S14, updating the probability depth filter of the edge point by using new image information, and removing the edge point with the too low effective observation probability; the method comprises the following specific steps:

s31: in the newly added key frame, edge points are extracted, namely, Sobel operators are used for calculating the horizontal and longitudinal gray gradients G of the image_xAnd G_yFor grey scale gradient mapsApproximation; selecting 2500 to 4000 pixel points with the largest gray gradient from the image, reducing the gradient threshold value 450 to be 0.95 times of the original value when the number of the pixel points with obvious gradient is less than 2500, and increasing the threshold value to be 1.05 times of the original value when the number is more than 4000;

s32: remember the key frame as r_vThe previous key frame is r_v-1Will frame r_v-1Depth map D of^v-1Propagation to new key frame depth map D^v；r_v-1Middle, edge pointHas a mean of inverse depths ofThe variance of the reciprocal of depth isWherein; r is_v-1To r_vRelative position and posture of T_v，v-1Edge pointIn frame r_vPosition of corresponding edge point inCan be calculated from the projection relationship such asFormula (3-1):

wherein, t_zRepresenting translation of the camera on the optical axis; from this it derivesVariance of inverse depth ofComprises the following steps:

whereinRepresents the prediction uncertainty, willIs bound to r_vMiddle nearest integer pixel point

S33: creating a probability depth filter for each edge point extracted in step S31, if the memory of the 3x3 pixel range near a certain edge point is in the prior estimationThen its depthMean value ofInitializing as equation (3-4):

uncertainty of inverse depth of parameter of probability depth filter of edge pointInitializing to the minimum uncertainty in the prior estimation; if no prior estimation exists, initializing the probability depth filter of the edge point into average depth and maximum uncertainty; getThen for each edge point, the probability depth filter for constructing an edge point according to equation (2-1) is as follows (3-5):

wherein each parameter of the equation (3-5) defines each parameter of the probability depth filter corresponding to the edge point, a, the same as in the equation (2-1)_nAnd b_nHas an initial value of 10, mu_n，σ_nIs initialized toAnd

s34: new key frame r_vSetting the reference frame as the subsequent image, and utilizing the subsequent image to frame r_vDepth map D of (1)^vCarrying out observation updating; using images I_kTo D^vWhen updating, the edge points of the update are recorded asAccording to I_kAnd r_vRelative position and attitude of_kDetermine a straight line, ensure I_kNeutralization ofCorresponding pixel pointOccurs on this straight line; if the included angle between the edge point gray scale gradient direction and the straight line direction is less than the threshold value, the sub-pixel precision block matching is carried out to obtain the edge point gray scale imageAfter the coordinates of (2), calculating the depth of the edge point by triangulationStep S35 is executed; if the included angle is larger than the threshold value, no search is performed, and the step S34 is returned to check the next edge point;

s35: the depth obtained in step S34Take the reciprocal ofReciprocal of depthIs modeled as in equation (3-6):

the meaning of the above formula is: key frame r_vMiddle ith edge point via image I_kIf the measurement is valid, the reciprocal of the measurement resultSatisfy normal distributionIf it is an invalid measurement, thenSatisfy uniform distributionWhereinThe minimum value and the maximum value of the depth reciprocal are set according to the prior information of the current scene; throughUpdated by observation of (2), inverse depthA posterior probability distribution ofAs shown in formulas (2-3):

a is calculated by a moment matching method_n，b_n，μ_n，

S36: rejecting effective measurement probabilitiesEdge points less than 0.1.